diff options
Diffstat (limited to 'nvidia1/Custom+CUDA+Kernels+in+Python+with+Numba.ipynb')
| -rw-r--r-- | nvidia1/Custom+CUDA+Kernels+in+Python+with+Numba.ipynb | 5434 |
1 files changed, 5434 insertions, 0 deletions
diff --git a/nvidia1/Custom+CUDA+Kernels+in+Python+with+Numba.ipynb b/nvidia1/Custom+CUDA+Kernels+in+Python+with+Numba.ipynb new file mode 100644 index 0000000..9807623 --- /dev/null +++ b/nvidia1/Custom+CUDA+Kernels+in+Python+with+Numba.ipynb @@ -0,0 +1,5434 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "<a href=\"https://www.nvidia.com/dli\"> <img src=\"images/DLI Header.png\" alt=\"Header\" style=\"width: 400px;\"/> </a>" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Custom CUDA Kernels in Python with Numba\n", + "\n", + "In this section we will go further into our understanding of how the CUDA programming model organizes parallel work, and will leverage this understanding to write custom CUDA **kernels**, functions which run in parallel on CUDA GPUs. Custom CUDA kernels, in utilizing the CUDA programming model, require more work to implement than, for example, simply decorating a ufunc with `@vectorize`. However, they make possible parallel computing in places where ufuncs are just not able, and provide a flexibility that can lead to the highest level of performance.\n", + "\n", + "This section contains three appendices for those of you interested in futher study: a variety of debugging techniques to assist your GPU programming, links to CUDA programming references, and coverage of Numba supported random number generation on the GPU." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Objectives\n", + "\n", + "By the time you complete this section you will be able to:\n", + "\n", + "* Write custom CUDA kernels in Python and launch them with an execution configuration.\n", + "* Utilize grid stride loops for working in parallel over large data sets and leveraging memory coalescing.\n", + "* Use atomic operations to avoid race conditions when working in parallel." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## The Need for Custom Kernels" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Ufuncs are fantastically elegant, and for any scalar operation that ought to be performed element wise on data, ufuncs are likely the right tool for the job.\n", + "\n", + "As you are well aware, there are many, if not more, classes of problems that cannot be solved by applying the same function to each element of a data set. Consider, for example, any problem that requires access to more than one element of a data structure in order to calculate its output, like stencil algorithms, or any problem that cannot be expressed by a one input value to one output value mapping, such as a reduction. Many of these problems are still inherently parallelizable, but cannot be expressed by a ufunc.\n", + "\n", + "Writing custom CUDA kernels, while more challenging than writing GPU accelerated ufuncs, provides developers with tremendous flexibility for the types of functions they can send to run in parallel on the GPU. Furthermore, as you will begin learning in this and the next section, it also provides fine-grained control over *how* the parallelism is conducted by exposing CUDA's thread hierarchy to developers explicitly.\n", + "\n", + "While remaining purely in Python, the way we write CUDA kernels using Numba is very reminiscent of how developers write them in CUDA C/C++. For those of you familiar with programming in CUDA C/C++, you will likely pick up custom kernels in Python with Numba very rapidly, and for those of you learning them for the first time, know that the work you do here will also serve you well should you ever need or wish to develop CUDA in C/C++, or even, make a study of the wealth of CUDA resources on the web that are most commonly portraying CUDA C/C++ code." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Introduction to CUDA Kernels\n", + "\n", + "When programming in CUDA, developers write functions for the GPU called **kernels**, which are executed, or in CUDA parlance, **launched**, on the GPU's many cores in parallel **threads**. When kernels are launched, programmers use a special syntax, called an **execution configuration** (also called a launch configuration) to describe the parallel execution's configuration.\n", + "\n", + "The following slides (which will appear after executing the cell below) give a high level introduction to how CUDA kernels can be created to work on large datasets in parallel on the GPU device. Work through the slides and then you will begin writing and executing your own custom CUDA kernels, using the ideas presented in the slides." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " <iframe\n", + " width=\"640\"\n", + " height=\"390\"\n", + " src=\"https://view.officeapps.live.com/op/view.aspx?src=https://developer.download.nvidia.com/training/courses/C-AC-02-V1/AC_CUDA_Python_1.pptx\"\n", + " frameborder=\"0\"\n", + " allowfullscreen\n", + " ></iframe>\n", + " " + ], + "text/plain": [ + "<IPython.lib.display.IFrame at 0x7f8d844ee668>" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from IPython.display import IFrame\n", + "IFrame('https://view.officeapps.live.com/op/view.aspx?src=https://developer.download.nvidia.com/training/courses/C-AC-02-V1/AC_CUDA_Python_1.pptx', 640, 390)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## A First CUDA Kernel\n", + "\n", + "Let's start with a concrete, and very simple example by rewriting our addition function for 1D NumPy arrays. CUDA kernels are compiled using the `numba.cuda.jit` decorator. `numba.cuda.jit` is not to be confused with the `numba.jit` decorator you've already learned which optimizes functions **for the CPU**.\n", + "\n", + "We will begin with a very simple example to highlight some of the essential syntax. Worth mentioning is that this particular function could in fact be written as a ufunc, but we choose it here to keep the focus on learning the syntax. We will be proceeding to functions more well suited to being written as a custom kernel below. Be sure to read the comments carefully, as they provide some important information about the code." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "from numba import cuda\n", + "\n", + "# Note the use of an `out` array. CUDA kernels written with `@cuda.jit` do not return values,\n", + "# just like their C counterparts. Also, no explicit type signature is required with @cuda.jit\n", + "@cuda.jit\n", + "def add_kernel(x, y, out):\n", + " \n", + " # The actual values of the following CUDA-provided variables for thread and block indices,\n", + " # like function parameters, are not known until the kernel is launched.\n", + " \n", + " # This calculation gives a unique thread index within the entire grid (see the slides above for more)\n", + " idx = cuda.grid(1) # 1 = one dimensional thread grid, returns a single value.\n", + " # This Numba-provided convenience function is equivalent to\n", + " # `cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x`\n", + "\n", + " # This thread will do the work on the data element with the same index as its own\n", + " # unique index within the grid.\n", + " out[idx] = x[idx] + y[idx]" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "n = 4096\n", + "x = np.arange(n).astype(np.int32) # [0...4095] on the host\n", + "y = np.ones_like(x) # [1...1] on the host\n", + "\n", + "d_x = cuda.to_device(x) # Copy of x on the device\n", + "d_y = cuda.to_device(y) # Copy of y on the device\n", + "d_out = cuda.device_array_like(d_x) # Like np.array_like, but for device arrays\n", + "\n", + "# Because of how we wrote the kernel above, we need to have a 1 thread to one data element mapping,\n", + "# therefore we define the number of threads in the grid (128*32) to equal n (4096).\n", + "threads_per_block = 128\n", + "blocks_per_grid = 32" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[ 1 2 3 ... 4094 4095 4096]\n" + ] + } + ], + "source": [ + "add_kernel[blocks_per_grid, threads_per_block](d_x, d_y, d_out)\n", + "cuda.synchronize()\n", + "print(d_out.copy_to_host()) # Should be [1...4096]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Exercise: Tweak the Code\n", + "\n", + "Make the following minor changes to the code above to see how it affects its execution. Make educated guesses about what will happen before running the code:\n", + "\n", + "* Decrease the `threads_per_block` variable\n", + "* Decrease the `blocks_per_grid` variable\n", + "* Increase the `threads_per_block` and/or `blocks_per_grid variables`\n", + "* Remove or comment out the `cuda.synchronize()` call\n", + "\n", + "### Results\n", + "\n", + "In the example above, because the kernel is written so that each thread works on exactly one data element, it is essential for the number of threads in the grid equal the number of data elements.\n", + "\n", + "By **reducing the number of threads in the grid**, either by reducing the number of blocks, and/or reducing the number of threads per block, there are elements where work is left undone and thus we can see in the output that the elements toward the end of the `d_out` array did not have any values added to it. If you edited the execution configuration by reducing the number of threads per block, then in fact there are other elements through the `d_out` array that were not processed.\n", + "\n", + "**Increasing the size of the grid** in fact creates issues with out of bounds memory access. This error will not show in your code presently, but later in this section you will learn how to expose this error using `cuda-memcheck` and debug it.\n", + "\n", + "You might have expected that **removing the synchronization point** would have resulted in a print showing that no or less work had been done. This is a reasonable guess since without a synchronization point the CPU will work asynchronously while the GPU is processing. The detail to learn here is that memory copies carry implicit synchronization, making the call to `cuda.synchronize` above unnecessary." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Exercise: Accelerate a CPU Function as a Custom CUDA Kernel\n", + "\n", + "Below is CPU scalar function `square_device` that could be used as a CPU ufunc. Your job is to refactor it to run as a CUDA kernel decorated with the `@cuda.jit` decorator.\n", + "\n", + "You might think that making this function run on the device could be much more easily done with `@vectorize`, and you would be correct. But this scenario will give you a chance to work with all the syntax we've introduced before moving on to more complicated and realistic examples.\n", + "\n", + "In this exercise you will need to:\n", + "* Refactor the `square_device` definition to be a CUDA kernel that will do one thread's worth of work on a single element.\n", + "* Refactor the `d_a` and `d_out` arrays below to be CUDA device arrays.\n", + "* Modify the `blocks` and `threads` variables to appropriate values for the provided `n`.\n", + "* Refactor the call to `square_device` to be a kernel launch that includes an execution configuration.\n", + "\n", + "The assertion test below will fail until you successfully implement the above. If you get stuck, feel free to check out a [solution](../edit/solutions/square_device_solution.py)." + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# Refactor to be a CUDA kernel doing one thread's work.\n", + "# Don't forget that when using `@cuda.jit`, you must provide an output array as no value will be returned.\n", + "def square_device(a):\n", + " return a**2\n", + "\n", + "@cuda.jit\n", + "def square_kernel(a, out):\n", + " idx = cuda.grid(1)\n", + " out[idx] = a[idx]*a[idx]" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# Leave the values in this cell fixed for this exercise\n", + "n = 4096\n", + "\n", + "a = np.arange(n)\n", + "out = a**2 # `out` will only be used for testing below" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [], + "source": [ + "d_a = cuda.to_device(a) # TODO make `d_a` a device array\n", + "d_out = cuda.device_array_like(a) # TODO: make d_out a device array\n", + "\n", + "# TODO: Update the execution configuration for the amount of work needed\n", + "blocks = 128\n", + "threads = 32\n", + "\n", + "# TODO: Launch as a kernel with an appropriate execution configuration\n", + "# d_out = square_device(d_a)\n", + "square_kernel[blocks, threads](d_a, d_out)\n", + "cuda.synchronize()\n", + "d_out = d_out.copy_to_host()" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [], + "source": [ + "from numpy import testing\n", + "testing.assert_almost_equal(d_out, out)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## An Aside on Hiding Latency and Execution Configuration Choices" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "CUDA enabled NVIDIA GPUs consist of several [**Streaming Multiprocessors**](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#hardware-implementation), or **SMs** on a die, with attached DRAM. SMs contain all required resources for the execution of kernel code including many CUDA cores. When a kernel is launched, each block is assigned to a single SM, with potentially many blocks assigned to a single SM. SMs partition blocks into further subdivisions of 32 threads called **warps** and it is these warps which are given parallel instructions to execute.\n", + "\n", + "When an instruction takes more than one clock cycle to complete (or in CUDA parlance, to **expire**) the SM can continue to do meaningful work *if it has additional warps that are ready to be issued new instructions.* Because of very large register files on the SMs, there is no time penalty for an SM to change context between issuing instructions to one warp or another. In short, the latency of operations can be hidden by SMs with other meaningful work so long as there is other work to be done.\n", + "\n", + "**Therefore, of primary importance to utilizing the full potential of the GPU, and thereby writing performant accelerated applications, it is essential to give SMs the ability to hide latency by providing them with a sufficient number of warps which can be accomplished most simply by executing kernels with sufficiently large grid and block dimensions.**\n", + "\n", + "Deciding the very best size for the CUDA thread grid is a complex problem, and depends on both the algorithm and the specific GPU's [compute capability](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities), but here are some very rough heuristics that we tend to follow and which can work well for getting started:\n", + "\n", + " * The size of a block should be a multiple of 32 threads (the size of a warp), with typical block sizes between 128 and 512 threads per block.\n", + " * The size of the grid should ensure the full GPU is utilized where possible. Launching a grid where the number of blocks is 2x-4x the number of SMs on the GPU is a good starting place. Something in the range of 20 - 100 blocks is usually a good starting point.\n", + " * The CUDA kernel launch overhead does increase with the number of blocks, so when the input size is very large we find it best not to launch a grid where the number of threads equals the number of input elements, which would result in a tremendous number of blocks. Instead we use a pattern to which we will now turn our attention for dealing with large inputs." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Working on Largest Datasets with Grid Stride Loops\n", + "\n", + "The following slides give a high level overview of a technique called a **grid stride loop** which will create flexible kernels where each thread is able to work on more than one data element, an essential technique for large datasets. Execute the cell to load the slides." + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " <iframe\n", + " width=\"640\"\n", + " height=\"390\"\n", + " src=\"https://view.officeapps.live.com/op/view.aspx?src=https://developer.download.nvidia.com/training/courses/C-AC-02-V1/AC_CUDA_Python_2.pptx\"\n", + " frameborder=\"0\"\n", + " allowfullscreen\n", + " ></iframe>\n", + " " + ], + "text/plain": [ + "<IPython.lib.display.IFrame at 0x7f8d53069710>" + ] + }, + "execution_count": 37, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from IPython.display import IFrame\n", + "IFrame('https://view.officeapps.live.com/op/view.aspx?src=https://developer.download.nvidia.com/training/courses/C-AC-02-V1/AC_CUDA_Python_2.pptx', 640, 390)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## A First Grid Stride Loop\n", + "\n", + "Let's refactor the `add_kernel` above to utilize a grid stride loop so that we can launch it to work on larger data sets flexibly while incurring the benefits of global **memory coalescing**, which allows parallel threads to access memory in contiguous chunks, a scenario which the GPU can leverage to reduce the total number of memory operations:" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "from numba import cuda\n", + "\n", + "@cuda.jit\n", + "def add_kernel(x, y, out):\n", + " \n", + "\n", + " start = cuda.grid(1)\n", + " \n", + " # This calculation gives the total number of threads in the entire grid\n", + " stride = cuda.gridsize(1) # 1 = one dimensional thread grid, returns a single value.\n", + " # This Numba-provided convenience function is equivalent to\n", + " # `cuda.blockDim.x * cuda.gridDim.x`\n", + "\n", + " # This thread will start work at the data element index equal to that of its own\n", + " # unique index in the grid, and then, will stride the number of threads in the grid each\n", + " # iteration so long as it has not stepped out of the data's bounds. In this way, each\n", + " # thread may work on more than one data element, and together, all threads will work on\n", + " # every data element.\n", + " for i in range(start, x.shape[0], stride):\n", + " # Assuming x and y inputs are same length\n", + " out[i] = x[i] + y[i]" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(100000,)\n" + ] + } + ], + "source": [ + "import numpy as np\n", + "\n", + "n = 100000 # This is far more elements than threads in our grid\n", + "x = np.arange(n).astype(np.int32)\n", + "y = np.ones_like(x)\n", + "\n", + "d_x = cuda.to_device(x)\n", + "print(d_x.shape)\n", + "d_y = cuda.to_device(y)\n", + "d_out = cuda.device_array_like(d_x)\n", + "\n", + "threads_per_block = 128\n", + "blocks_per_grid = 30" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[ 1 2 3 ... 99998 99999 100000]\n" + ] + } + ], + "source": [ + "add_kernel[blocks_per_grid, threads_per_block](d_x, d_y, d_out)\n", + "print(d_out.copy_to_host()) # Remember, memory copy carries implicit synchronization" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Exercise: Implement a Grid Stride Loop\n", + "\n", + "Refactor the following CPU scalar `hypot_stride` function to run as a CUDA Kernel utilizing a grid stride loop. Feel free to look at [the solution](../edit/solutions/hypot_stride_solution.py) if you get stuck." + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "from math import hypot\n", + "from numba import cuda\n", + "\n", + "@cuda.jit\n", + "def hypot_stride(a, b, c):\n", + " start = cuda.grid(1)\n", + " stride = cuda.gridsize(1)\n", + " for i in range(start, a.shape[0], stride):\n", + " c[i] = hypot(a[i], b[i])" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": {}, + "outputs": [], + "source": [ + "# You do not need to modify the contents in this cell\n", + "n = 1000000\n", + "a = np.random.uniform(-12, 12, n).astype(np.float32)\n", + "b = np.random.uniform(-12, 12, n).astype(np.float32)\n", + "d_a = cuda.to_device(a)\n", + "d_b = cuda.to_device(b)\n", + "d_c = cuda.device_array_like(d_b)\n", + "\n", + "blocks = 128\n", + "threads_per_block = 64\n", + "\n", + "hypot_stride[blocks, threads_per_block](d_a, d_b, d_c)" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "from numpy import testing\n", + "# This assertion will fail until you successfully implement the hypot_stride kernel above\n", + "testing.assert_almost_equal(np.hypot(a,b), d_c.copy_to_host(), decimal=5)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Timing the Kernel\n", + "\n", + "Let's take the time to do some performance timing for the `hypot_stride` kernel. If you weren't able to successfully implement it, copy and execute [the solution](../edit/solutions/hypot_stride_solution.py) before timing." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### CPU Baseline\n", + "\n", + "First let's get a baseline with `np.hypot`:" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "6.07 ms ± 2.38 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" + ] + } + ], + "source": [ + "%timeit np.hypot(a, b)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Numba on the CPU\n", + "\n", + "Next let's see about a CPU optimized version:" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "from numba import jit\n", + "\n", + "@jit\n", + "def numba_hypot(a, b):\n", + " return np.hypot(a, b)" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "5.68 ms ± 1.72 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" + ] + } + ], + "source": [ + "%timeit numba_hypot(a, b)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Single Threaded on the Device\n", + "\n", + "Just to see, let's launch our kernel in a grid with only a single thread. Here we will use `%time`, which only runs the statement once to ensure our measurement isn't affected by the finite depth of the CUDA kernel queue. We will also add a `cuda.synchronize` to be sure we don't get any innacurate times on account of returning control to the CPU, where the timer is, before the kernel completes:" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "CPU times: user 188 ms, sys: 128 ms, total: 316 ms\n", + "Wall time: 316 ms\n" + ] + } + ], + "source": [ + "%time hypot_stride[1, 1](d_a, d_b, d_c); cuda.synchronize()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Hopefully not too much of a surprise that this is way slower than even the baseline CPU execution." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Parallel on the Device" + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "CPU times: user 0 ns, sys: 0 ns, total: 0 ns\n", + "Wall time: 696 µs\n" + ] + } + ], + "source": [ + "%time hypot_stride[128, 64](d_a, d_b, d_c); cuda.synchronize()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "That's much faster!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Atomic Operations and Avoiding Race Conditions\n", + "\n", + "CUDA, like many general purpose parallel execution frameworks, makes it possible to have race conditions in your code. A race condition in CUDA arises when threads read to or write from a memory location that might be modified by another independent thread. Generally speaking, you need to worry about:\n", + "\n", + " * read-after-write hazards: One thread is reading a memory location at the same time another thread might be writing to it.\n", + " * write-after-write hazards: Two threads are writing to the same memory location, and only one write will be visible when the kernel is complete.\n", + " \n", + "A common strategy to avoid both of these hazards is to organize your CUDA kernel algorithm such that each thread has exclusive responsibility for unique subsets of output array elements, and/or to never use the same array for both input and output in a single kernel call. (Iterative algorithms can use a double-buffering strategy if needed, and switch input and output arrays on each iteration.)\n", + "\n", + "However, there are many cases where different threads need to combine results. Consider something very simple, like: \"every thread increments a global counter.\" Implementing this in your kernel requires each thread to:\n", + "\n", + "1. Read the current value of a global counter.\n", + "2. Compute `counter + 1`.\n", + "3. Write that value back to global memory.\n", + "\n", + "However, there is no guarantee that another thread has not changed the global counter between steps 1 and 3. To resolve this problem, CUDA provides **atomic operations** which will read, modify and update a memory location in one, indivisible step. Numba supports several of these functions, [described here](http://numba.pydata.org/numba-doc/dev/cuda/intrinsics.html#supported-atomic-operations).\n", + "\n", + "Let's make our thread counter kernel:" + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "@cuda.jit\n", + "def thread_counter_race_condition(global_counter):\n", + " global_counter[0] += 1 # This is bad\n", + " \n", + "@cuda.jit\n", + "def thread_counter_safe(global_counter):\n", + " cuda.atomic.add(global_counter, 0, 1) # Safely add 1 to offset 0 in global_counter array" + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Should be 4096: [1]\n" + ] + } + ], + "source": [ + "# This gets the wrong answer\n", + "global_counter = cuda.to_device(np.array([0], dtype=np.int32))\n", + "thread_counter_race_condition[64, 64](global_counter)\n", + "\n", + "print('Should be %d:' % (64*64), global_counter.copy_to_host())" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Should be 4096: [4096]\n" + ] + } + ], + "source": [ + "# This works correctly\n", + "global_counter = cuda.to_device(np.array([0], dtype=np.int32))\n", + "thread_counter_safe[64, 64](global_counter)\n", + "\n", + "print('Should be %d:' % (64*64), global_counter.copy_to_host())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Assessment\n", + "\n", + "The following exercise will require you to utilize everything you've learned so far. Unlike previous exercises, there will not be any solution code available to you, and, there are a couple additional steps you will need to take to \"run the assessment\" and get a score for your attempt(s). **Please read the directions carefully before beginning your work to ensure the best chance at successfully completing the assessment.**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### How to Run the Assessment\n", + "\n", + "Take the following steps to complete this assessment:\n", + "\n", + "1. Using the instructions that follow, work on the cells below as you usually would for an exercise.\n", + "2. When you are satisfied with your work, follow the instructions below to copy and paste code in into linked source code files. Be sure to save the files after you paste your work.\n", + "3. Return to the browser tab you used to launch this notebook, and click on the **\"Assess\"** button. After a few seconds a score will be generated along with a helpful message.\n", + "\n", + "You are welcome to click on the **Assess** button as many times as you like, so feel free if you don't pass the first time to make additional modifications to your code and repeat steps 1 through 3. Good luck!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": true + }, + "source": [ + "### Write an Accelerated Histogramming Kernel\n", + "\n", + "For this assessment, you will create an accelerated histogramming kernel. This will take an array of input data, a range, and a number of bins, and count how many of the input data elements land in each bin. Below is a working CPU implementation of histogramming to serve as an example for your work:" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "def cpu_histogram(x, xmin, xmax, histogram_out):\n", + " '''Increment bin counts in histogram_out, given histogram range [xmin, xmax).'''\n", + " # Note that we don't have to pass in nbins explicitly, because the size of histogram_out determines it\n", + " nbins = histogram_out.shape[0]\n", + " bin_width = (xmax - xmin) / nbins\n", + " \n", + " # This is a very slow way to do this with NumPy, but looks similar to what you will do on the GPU\n", + " for element in x:\n", + " bin_number = np.int32((element - xmin)/bin_width)\n", + " if bin_number >= 0 and bin_number < histogram_out.shape[0]:\n", + " # only increment if in range\n", + " histogram_out[bin_number] += 1" + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 3, 88, 445, 1576, 2969, 2854, 1548, 442, 72, 3],\n", + " dtype=int32)" + ] + }, + "execution_count": 63, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x = np.random.normal(size=10000, loc=0, scale=1).astype(np.float32)\n", + "xmin = np.float32(-4.0)\n", + "xmax = np.float32(4.0)\n", + "histogram_out = np.zeros(shape=10, dtype=np.int32)\n", + "\n", + "cpu_histogram(x, xmin, xmax, histogram_out)\n", + "\n", + "histogram_out" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Using a grid stride loop and atomic operations, implement your solution in the cell below. After making any modifications, and before running the assessment, paste this cell's content into [**`assessment/histogram.py`**](../edit/assessment/histogram.py) and save it." + ] + }, + { + "cell_type": "code", + "execution_count": 73, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "@cuda.jit\n", + "def cuda_histogram(x, xmin, xmax, histogram_out):\n", + " '''Increment bin counts in histogram_out, given histogram range [xmin, xmax).'''\n", + " nbins = histogram_out.shape[0]\n", + " bin_width = (xmax - xmin) / nbins\n", + " \n", + " start = cuda.grid(1)\n", + " stride = cuda.gridsize(1)\n", + " for i in range(start, x.shape[0], stride):\n", + " bin_number = np.int32((x[i] - xmin)/bin_width)\n", + " if bin_number >= 0 and bin_number < histogram_out.shape[0]:\n", + " cuda.atomic.add(histogram_out, bin_number, 1)\n", + " pass # Replace this with your implementation" + ] + }, + { + "cell_type": "code", + "execution_count": 74, + "metadata": {}, + "outputs": [], + "source": [ + "d_x = cuda.to_device(x)\n", + "d_histogram_out = cuda.to_device(np.zeros(shape=10, dtype=np.int32))\n", + "\n", + "blocks = 128\n", + "threads_per_block = 64\n", + "\n", + "cuda_histogram[blocks, threads_per_block](d_x, xmin, xmax, d_histogram_out)" + ] + }, + { + "cell_type": "code", + "execution_count": 75, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# This assertion will fail until you correctly implement `cuda_histogram`\n", + "np.testing.assert_array_almost_equal(d_histogram_out.copy_to_host(), histogram_out, decimal=2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Summary\n", + "\n", + "In this section you learned how to:\n", + "\n", + "* Write custom CUDA kernels in Python and launch them with an execution configuration.\n", + "* Utilize grid stride loops for working in parallel over large data sets and leveraging memory coalescing.\n", + "* Use atomic operations to avoid race conditions when working in parallel." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Download Content\n", + "\n", + "To download the contents of this notebook, execute the following cell and then click the download link below. Note: If you run this notebook on a local Jupyter server, you can expect some of the file path links in the notebook to be broken as they are shaped to our own platform. You can still navigate to the files through the Jupyter file navigator." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "!tar -zcvf section2.tar.gz ." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Download files from this section.](files/section2.tar.gz)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Appendix: Troubleshooting and Debugging\n", + "\n", + "### Note about the Terminal\n", + "\n", + "Debugging is an important part of programming. Unfortuntely, it is pretty difficult to debug CUDA kernels directly in the Jupyter notebook for a variety of reasons, so this notebook will show terminal commands by executing Jupyter notebook cells using the shell. These shell commands will appear in notebook cells with the command line prefixed by `!`. When applying the debug methods described in this notebook, you will likely run the commands in the terminal directly.\n", + "\n", + "### Printing\n", + "\n", + "A common debugging strategy is printing to the console. Numba supports printing from CUDA kernels, with some restrictions. Note that output printed from a CUDA kernel will not be captured by Jupyter, so you will need to debug with a script you can run from the terminal.\n", + "\n", + "Let's look at a CUDA kernel with a bug:" + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "import numpy as np\r\n", + "\r\n", + "from numba import cuda\r\n", + "\r\n", + "@cuda.jit\r\n", + "def histogram(x, xmin, xmax, histogram_out):\r\n", + " nbins = histogram_out.shape[0]\r\n", + " bin_width = (xmax - xmin) / nbins\r\n", + "\r\n", + " start = cuda.grid(1)\r\n", + " stride = cuda.gridsize(1)\r\n", + "\r\n", + " for i in range(start, x.shape[0], stride):\r\n", + " bin_number = np.int32((x[i] - xmin)/bin_width)\r\n", + " if bin_number >= 0 and bin_number < histogram_out.shape[0]:\r\n", + " histogram_out[bin_number] += 1\r\n", + "\r\n", + "x = np.random.normal(size=50, loc=0, scale=1).astype(np.float32)\r\n", + "xmin = np.float32(-4.0)\r\n", + "xmax = np.float32(4.0)\r\n", + "histogram_out = np.zeros(shape=10, dtype=np.int32)\r\n", + "\r\n", + "histogram[64, 64](x, xmin, xmax, histogram_out)\r\n", + "\r\n", + "print('input count:', x.shape[0])\r\n", + "print('histogram:', histogram_out)\r\n", + "print('count:', histogram_out.sum())\r\n" + ] + } + ], + "source": [ + "! cat debug/ex1.py" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": true + }, + "source": [ + "When we run this code to histogram 50 values, we see the histogram is not getting 50 entries: " + ] + }, + { + "cell_type": "code", + "execution_count": 76, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "input count: 50\r\n", + "histogram: [0 1 1 1 1 1 1 0 0 0]\r\n", + "count: 6\r\n" + ] + } + ], + "source": [ + "! python debug/ex1.py" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "*(You might have already spotted the mistake, but let's pretend we don't know the answer.)*\n", + "\n", + "We hypothesize that maybe a bin calculation error is causing many of the histogram entries to appear out of range. Let's add some printing around the `if` statement to show us what is going on:" + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "import numpy as np\r\n", + "\r\n", + "from numba import cuda\r\n", + "\r\n", + "@cuda.jit\r\n", + "def histogram(x, xmin, xmax, histogram_out):\r\n", + " nbins = histogram_out.shape[0]\r\n", + " bin_width = (xmax - xmin) / nbins\r\n", + "\r\n", + " start = cuda.grid(1)\r\n", + " stride = cuda.gridsize(1)\r\n", + "\r\n", + " for i in range(start, x.shape[0], stride):\r\n", + " bin_number = np.int32((x[i] - xmin)/bin_width)\r\n", + " if bin_number >= 0 and bin_number < histogram_out.shape[0]:\r\n", + " histogram_out[bin_number] += 1\r\n", + " print('in range', x[i], bin_number)\r\n", + " else:\r\n", + " print('out of range', x[i], bin_number)\r\n", + "\r\n", + "x = np.random.normal(size=50, loc=0, scale=1).astype(np.float32)\r\n", + "xmin = np.float32(-4.0)\r\n", + "xmax = np.float32(4.0)\r\n", + "histogram_out = np.zeros(shape=10, dtype=np.int32)\r\n", + "\r\n", + "histogram[64, 64](x, xmin, xmax, histogram_out)\r\n", + "\r\n", + "print('input count:', x.shape[0])\r\n", + "print('histogram:', histogram_out)\r\n", + "print('count:', histogram_out.sum())\r\n" + ] + } + ], + "source": [ + "! cat debug/ex1a.py" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This kernel will print every value and bin number it calculates. Looking at one of the print statements, we see that `print` supports constant strings, and scalar values:\n", + "\n", + "``` python\n", + "print('in range', x[i], bin_number)\n", + "```\n", + "\n", + "String substitution (using C printf syntax or the newer `format()` syntax) is not supported. If we run this script we see:" + ] + }, + { + "cell_type": "code", + "execution_count": 77, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "in range 0.261026 5\r\n", + "in range -2.102431 2\r\n", + "in range 0.799183 5\r\n", + "in range 1.051908 6\r\n", + "in range -0.201711 4\r\n", + "in range -1.698864 2\r\n", + "in range 0.248762 5\r\n", + "in range 1.782836 7\r\n", + "in range -0.594408 4\r\n", + "in range 1.867431 7\r\n", + "in range 0.418070 5\r\n", + "in range 0.365282 5\r\n", + "in range -0.655639 4\r\n", + "in range 0.817385 6\r\n", + "in range 0.646000 5\r\n", + "in range 0.776718 5\r\n", + "in range -0.665656 4\r\n", + "in range 0.431279 5\r\n", + "in range 0.480257 5\r\n", + "in range 0.769916 5\r\n", + "in range 0.386032 5\r\n", + "in range -0.824273 3\r\n", + "in range -0.310682 4\r\n", + "in range -1.554290 3\r\n", + "in range 1.897843 7\r\n", + "in range -0.788933 4\r\n", + "in range -0.509624 4\r\n", + "in range -0.854971 3\r\n", + "in range 0.470186 5\r\n", + "in range 1.196934 6\r\n", + "in range 0.821883 6\r\n", + "in range 1.011266 6\r\n", + "in range -3.438190 0\r\n", + "in range 0.612806 5\r\n", + "in range 0.789266 5\r\n", + "in range -2.211243 2\r\n", + "in range 1.039794 6\r\n", + "in range 2.000385 7\r\n", + "in range -1.390927 3\r\n", + "in range 1.432608 6\r\n", + "in range 0.208954 5\r\n", + "in range -1.194161 3\r\n", + "in range 0.558909 5\r\n", + "in range 0.494454 5\r\n", + "in range 0.149325 5\r\n", + "in range -0.593924 4\r\n", + "in range 0.702312 5\r\n", + "in range 0.765463 5\r\n", + "in range -1.847362 2\r\n", + "in range 2.459083 8\r\n", + "input count: 50\r\n", + "histogram: [1 0 1 1 1 1 1 1 1 0]\r\n", + "count: 8\r\n" + ] + } + ], + "source": [ + "! python debug/ex1a.py" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Scanning down that output, we see that all 50 values should be in range. Clearly we have some kind of race condition updating the histogram. In fact, the culprit line is:\n", + "\n", + "``` python\n", + "histogram_out[bin_number] += 1\n", + "```\n", + "\n", + "which should be (as you may have seen in a previous exercise)\n", + "\n", + "``` python\n", + "cuda.atomic.add(histogram_out, bin_number, 1)\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": true + }, + "source": [ + "### CUDA Simulator\n", + "\n", + "Back in the early days of CUDA, `nvcc` had an \"emulator\" mode that would execute CUDA code on the CPU for debugging. That functionality was dropped in later CUDA releases after `cuda-gdb` was created. There isn't a debugger for CUDA+Python, so Numba includes a \"CUDA simulator\" in Numba that runs your CUDA code with the Python interpreter on the host CPU. This allows you to debug the logic of your code using Python modules and functions that would otherwise be not allowed by the compile.\n", + "\n", + "A very common use case is to start the Python debugger inside one thread of a CUDA kernel:\n", + "``` python\n", + "import numpy as np\n", + "\n", + "from numba import cuda\n", + "\n", + "@cuda.jit\n", + "def histogram(x, xmin, xmax, histogram_out):\n", + " nbins = histogram_out.shape[0]\n", + " bin_width = (xmax - xmin) / nbins\n", + "\n", + " start = cuda.grid(1)\n", + " stride = cuda.gridsize(1)\n", + "\n", + " ### DEBUG FIRST THREAD\n", + " if start == 0:\n", + " from pdb import set_trace; set_trace()\n", + " ###\n", + "\n", + " for i in range(start, x.shape[0], stride):\n", + " bin_number = np.int32((x[i] + xmin)/bin_width)\n", + "\n", + " if bin_number >= 0 and bin_number < histogram_out.shape[0]:\n", + " cuda.atomic.add(histogram_out, bin_number, 1)\n", + "\n", + "x = np.random.normal(size=50, loc=0, scale=1).astype(np.float32)\n", + "xmin = np.float32(-4.0)\n", + "xmax = np.float32(4.0)\n", + "histogram_out = np.zeros(shape=10, dtype=np.int32)\n", + "\n", + "histogram[64, 64](x, xmin, xmax, histogram_out)\n", + "\n", + "print('input count:', x.shape[0])\n", + "print('histogram:', histogram_out)\n", + "print('count:', histogram_out.sum())\n", + "```\n", + "\n", + "This code allows a debug session like the following to take place:\n", + "```\n", + "(gtc2017) 0179-sseibert:gtc2017-numba sseibert$ NUMBA_ENABLE_CUDASIM=1 python debug/ex2.py\n", + "> /Users/sseibert/continuum/conferences/gtc2017-numba/debug/ex2.py(18)histogram()\n", + "-> for i in range(start, x.shape[0], stride):\n", + "(Pdb) n\n", + "> /Users/sseibert/continuum/conferences/gtc2017-numba/debug/ex2.py(19)histogram()\n", + "-> bin_number = np.int32((x[i] + xmin)/bin_width)\n", + "(Pdb) n\n", + "> /Users/sseibert/continuum/conferences/gtc2017-numba/debug/ex2.py(21)histogram()\n", + "-> if bin_number >= 0 and bin_number < histogram_out.shape[0]:\n", + "(Pdb) p bin_number, x[i]\n", + "(-6, -1.4435024)\n", + "(Pdb) p x[i], xmin, bin_width\n", + "(-1.4435024, -4.0, 0.80000000000000004)\n", + "(Pdb) p (x[i] - xmin) / bin_width\n", + "3.1956219673156738\n", + "(Pdb) q\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": true + }, + "source": [ + "### CUDA Memcheck\n", + "\n", + "Another common error occurs when a CUDA kernel has an invalid memory access, typically caused by running off the end of an array. The full CUDA toolkit from NVIDIA (not the `cudatoolkit` conda package) contain a utility called `cuda-memcheck` that can check for a wide range of memory access mistakes in CUDA code.\n", + "\n", + "Let's debug the following code:" + ] + }, + { + "cell_type": "code", + "execution_count": 78, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "import numpy as np\r\n", + "\r\n", + "from numba import cuda\r\n", + "\r\n", + "@cuda.jit\r\n", + "def histogram(x, xmin, xmax, histogram_out):\r\n", + " nbins = histogram_out.shape[0]\r\n", + " bin_width = (xmax - xmin) / nbins\r\n", + "\r\n", + " start = cuda.grid(1)\r\n", + " stride = cuda.gridsize(1)\r\n", + "\r\n", + " for i in range(start, x.shape[0], stride):\r\n", + " bin_number = np.int32((x[i] + xmin)/bin_width)\r\n", + "\r\n", + " if bin_number >= 0 or bin_number < histogram_out.shape[0]:\r\n", + " cuda.atomic.add(histogram_out, bin_number, 1)\r\n", + "\r\n", + "x = np.random.normal(size=50, loc=0, scale=1).astype(np.float32)\r\n", + "xmin = np.float32(-4.0)\r\n", + "xmax = np.float32(4.0)\r\n", + "histogram_out = np.zeros(shape=10, dtype=np.int32)\r\n", + "\r\n", + "histogram[64, 64](x, xmin, xmax, histogram_out)\r\n", + "\r\n", + "print('input count:', x.shape[0])\r\n", + "print('histogram:', histogram_out)\r\n", + "print('count:', histogram_out.sum())\r\n" + ] + } + ], + "source": [ + "! cat debug/ex3.py" + ] + }, + { + "cell_type": "code", + "execution_count": 79, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "========= CUDA-MEMCHECK\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (31,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001e8 is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (30,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001ec is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (29,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001f8 is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (28,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001e8 is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (27,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001f0 is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (26,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001f8 is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (25,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001fc is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (24,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001e4 is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (23,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001ec is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (22,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001ec is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (21,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001ec is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (20,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001f8 is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (19,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001ec is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (18,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001f4 is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (17,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001e8 is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (16,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001f0 is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (15,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001f0 is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (14,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001f4 is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (13,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001f0 is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (12,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001ec is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (11,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001e8 is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (10,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001e8 is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (9,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001e4 is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (8,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001e8 is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (7,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001f0 is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (6,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001e8 is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (5,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001f4 is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (4,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001ec is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (3,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001ec is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (2,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001ec is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (1,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001ec is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000900 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (0,0,0) in block (0,0,0)\n", + "========= Address 0x7f43844001f4 is out of bounds\n", + "========= Device Frame:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x900)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "========= Program hit CUDA_ERROR_LAUNCH_FAILED (error 719) due to \"unspecified launch failure\" on CUDA API call to cuMemcpyDtoH_v2. \n", + "========= Saved host backtrace up to driver entry point at error\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuMemcpyDtoH_v2 + 0x1c9) [0x291fe9]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x966) [0x193d06]\n", + "========= Host Frame:python [0x1944d4]\n", + "Traceback (most recent call last):\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + " File \"debug/ex3.py\", line 24, in <module>\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python [0x1945e6]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191e46]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + " histogram[64, 64](x, xmin, xmax, histogram_out)\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + " File \"/home/appuser/Miniconda3/lib/python3.6/site-packages/numba/cuda/compiler.py\", line 755, in __call__\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + " cfg(*args)\n", + " File \"/home/appuser/Miniconda3/lib/python3.6/site-packages/numba/cuda/compiler.py\", line 494, in __call__\n", + " sharedmem=self.sharedmem)\n", + " File \"/home/appuser/Miniconda3/lib/python3.6/site-packages/numba/cuda/compiler.py\", line 596, in _kernel_call\n", + " wb()\n", + " File \"/home/appuser/Miniconda3/lib/python3.6/site-packages/numba/cuda/args.py\", line 65, in <lambda>\n", + " retr.append(lambda: devary.copy_to_host(self.value, stream=stream))\n", + " File \"/home/appuser/Miniconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/devices.py\", line 212, in _require_cuda_context\n", + " return fn(*args, **kws)\n", + " File \"/home/appuser/Miniconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/devicearray.py\", line 252, in copy_to_host\n", + " _driver.device_to_host(hostary, self, self.alloc_size, stream=stream)\n", + " File \"/home/appuser/Miniconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py\", line 1819, in device_to_host\n", + " fn(host_pointer(dst), device_pointer(src), size, *varargs)\n", + " File \"/home/appuser/Miniconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py\", line 290, in safe_cuda_api_call\n", + " self._check_error(fname, retcode)\n", + " File \"/home/appuser/Miniconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py\", line 325, in _check_error\n", + " raise CudaAPIError(retcode, msg)\n", + "numba.cuda.cudadrv.driver.CudaAPIError: [719] Call to cuMemcpyDtoH results in CUDA_ERROR_LAUNCH_FAILED\n", + "========= ERROR SUMMARY: 33 errors\n" + ] + } + ], + "source": [ + "! cuda-memcheck python debug/ex3.py" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The output of `cuda-memcheck` is clearly showing a problem with our histogram function:\n", + "```\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00000548 in cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "```\n", + "But we don't know which line it is. To get better error information, we can turn \"debug\" mode on when compiling the kernel, by changing the kernel to look like this:\n", + "``` python\n", + "@cuda.jit(debug=True)\n", + "def histogram(x, xmin, xmax, histogram_out):\n", + " nbins = histogram_out.shape[0]\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": 80, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "========= CUDA-MEMCHECK\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (31,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001f4 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (30,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001f0 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (29,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001f0 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (28,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001f0 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (27,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001f0 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (26,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001f0 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (25,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001e8 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (24,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001f0 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (23,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001e8 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (22,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001f4 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (21,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001f4 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (20,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001e4 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (19,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001ec is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (18,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001f4 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (17,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001ec is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (16,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001ec is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (15,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001f0 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (14,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001f0 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (13,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001e8 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (12,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001e8 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (11,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001f0 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (10,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001f4 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (9,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001f8 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (8,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001e8 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (7,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001e8 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (6,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001e8 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (5,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001f4 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (4,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001e8 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (3,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001e8 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (2,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001f4 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (1,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001e8 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n", + "========= Invalid __global__ write of size 4\n", + "========= at 0x00001bb0 in /dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>)\n", + "========= by thread (0,0,0) in block (0,0,0)\n", + "========= Address 0x7f4f464001f0 is out of bounds\n", + "========= Device Frame:/dli/task/debug/ex3a.py:17:cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) (cudapy::__main__::histogram$241(Array<float, int=1, C, mutable, aligned>, float, float, Array<int, int=1, C, mutable, aligned>) : 0x1bb0)\n", + "========= Saved host backtrace up to driver entry point at kernel launch time\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuLaunchKernel + 0x346) [0x297db6]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x19296b]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0xd0) [0x1161e0]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "=========\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "========= Program hit CUDA_ERROR_LAUNCH_FAILED (error 719) due to \"unspecified launch failure\" on CUDA API call to cuMemcpyDtoH_v2. \n", + "========= Saved host backtrace up to driver entry point at error\n", + "========= Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so (cuMemcpyDtoH_v2 + 0x1c9) [0x291fe9]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call_unix64 + 0x4c) [0x6adc]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/../../libffi.so.6 (ffi_call + 0x1f2) [0x6282]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so (_ctypes_callproc + 0x2ce) [0x12d6e]\n", + "========= Host Frame:/home/appuser/Miniconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so [0x137a5]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x966) [0x193d06]\n", + "========= Host Frame:python [0x1944d4]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "Traceback (most recent call last):\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + " File \"debug/ex3a.py\", line 24, in <module>\n", + "========= Host Frame:python [0x192b83]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python [0x191fae]\n", + "========= Host Frame:python [0x192be6]\n", + "========= Host Frame:python [0x198a65]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x10cb) [0x1bc31b]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x1ab0) [0x1bcd00]\n", + "========= Host Frame:python [0x191b76]\n", + "========= Host Frame:python (_PyFunction_FastCallDict + 0x1be) [0x19308e]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x26f) [0x1116ff]\n", + "========= Host Frame:python (_PyObject_Call_Prepend + 0x63) [0x116173]\n", + "========= Host Frame:python (PyObject_Call + 0x3e) [0x11113e]\n", + "========= Host Frame:python [0x16a101]\n", + "========= Host Frame:python (_PyObject_FastCallDict + 0x8b) [0x11151b]\n", + "========= Host Frame:python [0x198ade]\n", + "========= Host Frame:python (_PyEval_EvalFrameDefault + 0x30a) [0x1bb55a]\n", + "========= Host Frame:python (PyEval_EvalCodeEx + 0x329) [0x1936c9]\n", + "========= Host Frame:python (PyEval_EvalCode + 0x1c) [0x19445c]\n", + "========= Host Frame:python [0x214d54]\n", + " histogram[64, 64](x, xmin, xmax, histogram_out)\n", + "========= Host Frame:python (PyRun_FileExFlags + 0xa1) [0x215151]\n", + " File \"/home/appuser/Miniconda3/lib/python3.6/site-packages/numba/cuda/compiler.py\", line 755, in __call__\n", + "========= Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x215353]\n", + "========= Host Frame:python (Py_Main + 0x613) [0x218e43]\n", + "========= Host Frame:python (main + 0xee) [0xe328e]\n", + "========= Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf5) [0x21f45]\n", + "========= Host Frame:python [0x1c1fff]\n", + "=========\n", + " cfg(*args)\n", + " File \"/home/appuser/Miniconda3/lib/python3.6/site-packages/numba/cuda/compiler.py\", line 494, in __call__\n", + " sharedmem=self.sharedmem)\n", + " File \"/home/appuser/Miniconda3/lib/python3.6/site-packages/numba/cuda/compiler.py\", line 571, in _kernel_call\n", + " driver.device_to_host(ctypes.addressof(excval), excmem, excsz)\n", + " File \"/home/appuser/Miniconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py\", line 1819, in device_to_host\n", + " fn(host_pointer(dst), device_pointer(src), size, *varargs)\n", + " File \"/home/appuser/Miniconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py\", line 290, in safe_cuda_api_call\n", + " self._check_error(fname, retcode)\n", + " File \"/home/appuser/Miniconda3/lib/python3.6/site-packages/numba/cuda/cudadrv/driver.py\", line 325, in _check_error\n", + " raise CudaAPIError(retcode, msg)\n", + "numba.cuda.cudadrv.driver.CudaAPIError: [719] Call to cuMemcpyDtoH results in CUDA_ERROR_LAUNCH_FAILED\n", + "========= ERROR SUMMARY: 33 errors\n" + ] + } + ], + "source": [ + "! cuda-memcheck python debug/ex3a.py" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we get an error message that includes a source file and line number: `ex3a.py:17`." + ] + }, + { + "cell_type": "code", + "execution_count": 81, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " 15\t\r\n", + " 16\t if bin_number >= 0 or bin_number < histogram_out.shape[0]:\r\n", + " 17\t cuda.atomic.add(histogram_out, bin_number, 1)\r\n", + " 18\t\r\n", + " 19\tx = np.random.normal(size=50, loc=0, scale=1).astype(np.float32)\r\n" + ] + } + ], + "source": [ + "! cat -n debug/ex3a.py | grep -C 2 \"17\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "At this point, we might realize that our if statement incorrectly has an `or` instead of an `and`.\n", + "\n", + "`cuda-memcheck` has different modes for detecting different kinds of problems (similar to `valgrind` for debugging CPU memory access errors). Take a look at the documentation for more information: http://docs.nvidia.com/cuda/cuda-memcheck/" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Appendix: CUDA References\n", + "\n", + "It's worth bookmarking Chapters 1 and 2 of the CUDA C Programming Guide for study after the completion of this course. They are written for CUDA C, but are still highly applicable to programming CUDA Python.\n", + "\n", + " * Introduction: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#introduction\n", + " * Programming Model: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#programming-model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Appendix: Random Number Generation on the GPU with Numba\n", + "\n", + "GPUs can be extremely useful for Monte Carlo applications where you need to use large amounts of random numbers. CUDA ships with an excellent set of random number generation algorithms in the cuRAND library. Unfortunately, cuRAND is defined in a set of C headers which Numba can't easily compile or link to. (Numba's CUDA JIT does not ever create C code for CUDA kernels.) It is on the Numba roadmap to find a solution to this problem, but it may take some time.\n", + "\n", + "In the meantime, Numba version 0.33 and later includes the `xoroshiro128+` generator, which is pretty high quality, though with a smaller period ($2^{128} - 1$) than the XORWOW generator in cuRAND.\n", + "\n", + "To use it, you will want to initialize the RNG state on the host for each thread in your kernel. This state creation function initializes each state to be in the same sequence designated by the seed, but separated by $2^{64}$ steps from each other. This ensures that different threads will not accidentally end up with overlapping sequences (unless a single thread draws $2^{64}$ random numbers, which you won't have patience for):" + ] + }, + { + "cell_type": "code", + "execution_count": 82, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "from numba import cuda\n", + "from numba.cuda.random import create_xoroshiro128p_states, xoroshiro128p_uniform_float32\n", + "\n", + "threads_per_block = 64\n", + "blocks = 24\n", + "rng_states = create_xoroshiro128p_states(threads_per_block * blocks, seed=1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can use these random number states in our kernel by passing it in as an argument:" + ] + }, + { + "cell_type": "code", + "execution_count": 91, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "@cuda.jit\n", + "def monte_carlo_mean(rng_states, iterations, out):\n", + " thread_id = cuda.grid(1)\n", + " total = 0\n", + " for i in range(iterations):\n", + " sample = xoroshiro128p_uniform_float32(rng_states, thread_id) # Returns a float32 in range [0.0, 1.0)\n", + " total += sample\n", + " \n", + " out[thread_id] = total/iterations" + ] + }, + { + "cell_type": "code", + "execution_count": 98, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "0.49990836\n" + ] + } + ], + "source": [ + "out = cuda.device_array(threads_per_block * blocks, dtype=np.float32)\n", + "monte_carlo_mean[blocks, threads_per_block](rng_states, 10000, out)\n", + "print(out.copy_to_host().mean())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Exercise: Monte Carlo Pi on the GPU\n", + "\n", + "Let's revisit Monte Carlo Pi generating algorithm from the first section, where we had compiled it with Numba on the CPU." + ] + }, + { + "cell_type": "code", + "execution_count": 99, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "from numba import njit\n", + "import random\n", + "\n", + "@njit\n", + "def monte_carlo_pi(nsamples):\n", + " acc = 0\n", + " for i in range(nsamples):\n", + " x = random.random()\n", + " y = random.random()\n", + " if (x**2 + y**2) < 1.0:\n", + " acc += 1\n", + " return 4.0 * acc / nsamples" + ] + }, + { + "cell_type": "code", + "execution_count": 102, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "105 ms ± 30.8 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" + ] + } + ], + "source": [ + "nsamples = 10000000\n", + "%timeit monte_carlo_pi(nsamples)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Your task is to refactor `monte_carlo_pi_device` below, currently identical to `monte_carlo_pi` above, to run on the GPU. You can use `monte_carlo_mean` above for inspiration, but at the least you will need to:\n", + "\n", + "- Decorate to be a CUDA kernel\n", + "- Draw samples for the thread from the device RNG state (generated 2 cells below)\n", + "- Store each thread's results in an output array which will be meaned on the host (as `monte_carlo_mean` did above)\n", + "\n", + "If you look two cells below you will see that all the data has already been initialized, the execution configuration created, and the kernel launched. All you need to do is refactor the kernel definition in the cell immediately below. Check out [the solution](../edit/solutions/monte_carlo_pi_solution.py) if you get stuck." + ] + }, + { + "cell_type": "code", + "execution_count": 116, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "from numba import njit\n", + "import random\n", + "\n", + "# TODO: All your work will be in this cell. Refactor to run on the device successfully given the way the\n", + "# kernel is launched below.\n", + "@cuda.jit\n", + "def monte_carlo_pi_device(rng_states, nsamples, out):\n", + " idx = cuda.grid(1)\n", + "\n", + " if idx < out.size:\n", + " acc = 0\n", + " for i in range(nsamples):\n", + " x = xoroshiro128p_uniform_float32(rng_states, idx)\n", + " y = xoroshiro128p_uniform_float32(rng_states, idx)\n", + " if x*x + y*y < 1.0:\n", + " acc += 1\n", + " out[idx] = 4.0 * acc / nsamples" + ] + }, + { + "cell_type": "code", + "execution_count": 117, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "# Do not change any of the values in this cell\n", + "nsamples = 10000000\n", + "threads_per_block = 128\n", + "blocks = 32\n", + "\n", + "grid_size = threads_per_block * blocks\n", + "samples_per_thread = int(nsamples / grid_size) # Each thread only needs to work on a fraction of total number of samples.\n", + " # This could also be calcuated inside the kernel definition using `gridsize(1)`.\n", + "\n", + "rng_states = create_xoroshiro128p_states(grid_size, seed=1)\n", + "d_out = cuda.device_array(threads_per_block * blocks, dtype=np.float32)" + ] + }, + { + "cell_type": "code", + "execution_count": 118, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1.04 ms ± 62.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" + ] + } + ], + "source": [ + "%timeit monte_carlo_pi_device[blocks, threads_per_block](rng_states, samples_per_thread, d_out); cuda.synchronize()" + ] + }, + { + "cell_type": "code", + "execution_count": 114, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "3.140668\n" + ] + } + ], + "source": [ + "print(d_out.copy_to_host().mean())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "<a href=\"https://www.nvidia.com/dli\"> <img src=\"images/DLI Header.png\" alt=\"Header\" style=\"width: 400px;\"/> </a>" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.10" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} |
