python中的diff_python-Numpy和diff()
我正在嘗試創建已排序的numpy數組的差異,以便如果我記錄第一行的值和差異,則可以重新創建原始表,但存儲的數據較少.
因此,這是表格的示例:
my_array = numpy.array([(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1),
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2),
(9, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36, 146, 73, 34),
(9, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36, 146, 73, 35),
(9, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36)
],'uint8,uint8,uint8,uint8,uint8,uint8,uint8,uint8,uint8,uint8,uint8,uint8,uint8,uint8')
在運行numpy.diff(my_array)之后,我會期望像這樣:
[(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1),
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1),
(9, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36, 146, 73, 32),
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1),
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1)
]
Note: The data above comes from the first & last three rows of the
‘real’ data, which is much much larger. With the full dataset, most of the
rows after a diff would be 0,0,0,0,0,0,0,0,0,0,0,0,1 — which can a)
be stored in a much smaller struct, and b) will compress fantastically well on disk since most rows contain very similar data.
I should probably point out that the reason I have a whole bunch of uint8’s in the first place, is because I needed to store an array of extremely large numbers, in the smallest amount of memory possible. The largest number was 185439173519100986733232011757860, which is too big for uint64. In fact, the smallest number of bits to store it would be 108 bits, or 14 bytes (to the nearest byte). So to fit these large numbers into numpy, i use the following two functions:
def large_number_to_numpy(number,columns):
return tuple((number >> (8*x)) & 255 for x in range(columns-1,-1,-1))
def numpy_to_large_number(numbers):
return sum([y << (8*x) for x,y in enumerate(numbers[::-1])])
Which is used like this:
>>> large_number_to_numpy(185439173519100986733232011757860L,14)
(9L, 36L, 146L, 73L, 36L, 146L, 73L, 36L, 146L, 73L, 36L, 146L, 73L, 36L)
numpy_to_large_number((9L, 36L, 146L, 73L, 36L, 146L, 73L, 36L, 146L, 73L, 36L, 146L, 73L, 36L))
185439173519100986733232011757860L
With the array created like this:
my_array = numpy.zeros(TOTAL_ROWS,','.join(14*['uint8']))
And then populated with:
my_array[x] = large_number_to_numpy(large_number,14)
但是我得到了這個:
>>> my_array
array([(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1),
(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2),
(9, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36, 146, 73, 34),
(9, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36, 146, 73, 35),
(9, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36, 146, 73, 36)],
dtype=[('f0', 'u1'), ('f1', 'u1'), ('f2', 'u1'), ('f3', 'u1'), ('f4', 'u1'), ('f5', 'u1'), ('f6', 'u1'), ('f7', 'u1'), ('f8', 'u1'), ('f9', 'u1'), ('f10', 'u1'), ('f11', 'u1'), ('f12', 'u1'), ('f13', 'u1')])
>>> numpy.diff(my_array)
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python2.7/site-packages/numpy/lib/function_base.py", line 1567, in diff
return a[slice1]-a[slice2]
TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype([('f0', 'u1'), ('f1', 'u1'), ('f2', 'u1'), ('f3', 'u1'), ('f4', 'u1'), ('f5', 'u1'), ('f6', 'u1'), ('f7', 'u1'), ('f8', 'u1'), ('f9', 'u1'), ('f10', 'u1'), ('f11', 'u1'), ('f12', 'u1'), ('f13', 'u1')]) dtype([('f0', 'u1'), ('f1', 'u1'), ('f2', 'u1'), ('f3', 'u1'), ('f4', 'u1'), ('f5', 'u1'), ('f6', 'u1'), ('f7', 'u1'), ('f8', 'u1'), ('f9', 'u1'), ('f10', 'u1'), ('f11', 'u1'), ('f12', 'u1'), ('f13', 'u1')]) dtype([('f0', 'u1'), ('f1', 'u1'), ('f2', 'u1'), ('f3', 'u1'), ('f4', 'u1'), ('f5', 'u1'), ('f6', 'u1'), ('f7', 'u1'), ('f8', 'u1'), ('f9', 'u1'), ('f10', 'u1'), ('f11', 'u1'), ('f12', 'u1'), ('f13', 'u1')])
總結
以上是生活随笔為你收集整理的python中的diff_python-Numpy和diff()的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 泰勒公式推导过程_论泰勒级数在机器学习家
- 下一篇: nginx重定向到其他url方法_高级开