[Python Data Science Handbook] Ch 2 Introduction to Numpy

闲谈 Digressions

   
   

@ZYX 写于2020年10月06日

Chapter 2 Introduction to NumPy

Understanding Data Types in Python

A Python Integer Is More Than Just an Integer

  • A Python integer is a pointer to a position in memory containing all the Python object information, including the bytes that contain the integer value.

A Python List Is More Than Just a List

  • The Python list, on the other hand, contains a pointer to a block of pointers, each of which in turn points to a full Python object like the Python integer we saw earlier

Fixed-Type Arrays in Python

  • The built-in array module (available since Python 3.3) can be used to create dense arrays of a uniform type
    In[6]:import array
            L = list(range(10))
            A = array.array('i', L)
            A
    Out[6]: array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    
    • Much more useful is the ndarray object of the NumPy package.
      • While Python’s array object provides efficient storage of array-based data
      • NumPy adds to this efficient operations on that data.

Creating Arrays from Python Lists

  • First, we can use np.array to create from Python lists:
    In[8]:# integer array:
        np.array([1, 4, 2, 5, 3])
    Out[8]: array([1, 4, 2, 5, 3])
    
    • NumPy is constrained to arrays that all contain the same type
      • If types do not match, NumPy will upcast if possible
      • If we want to explicitly set the data type of the resulting array, we can use the dtype keyword
        In[10]: np.array([1, 2, 3, 4], dtype='float32')
    • NumPy arrays can explicitly be multidimensional

Creating Arrays from Scratch

  • numpy.zeros(shape, dtype=float, order='C')
  • np.ones()
  • numpy.full(shape, fill_value, dtype=None, order='C')
  • numpy.arange([start, ] stop, [step, ] dtype=None, *, like=None)
  • numpy.eye(N, M=None, k=0, dtype=<class 'float'>, order='C', *, like=None)[source]
    • create identity matrix
  • numpy.empty(shape[, dtype, order, like])
    • Create an uninitialized array
    • The values will be whatever happens to already exist at that memory location
  • numpy.linspace(start, stop[, num, endpoint, …])
  • numpy.random.random(shape)

NumPy Standard Data Types

  • bool_ Boolean (True or False) stored as a byte
  • int_ Default integer type (same as C long; normally either int64 or int32)
  • intc Identical to C int (normally int32 or int64)
  • intp Integer used for indexing (same as C ssize_t; normally either int32 or int64)
  • int8 Byte (–128 to 127)
  • int16 Integer (–32768 to 32767)
  • int32 Integer (–2147483648 to 2147483647)
  • int64 Integer (–9223372036854775808 to 9223372036854775807)
  • uint8 Unsigned integer (0 to 255)
  • uint16 Unsigned integer (0 to 65535)
  • uint32 Unsigned integer (0 to 4294967295)
  • uint64 Unsigned integer (0 to 18446744073709551615)
  • float_ Shorthand for float64
  • float16 Half-precision float: sign bit, 5 bits exponent, 10 bits mantissa
  • float32 Single-precision float: sign bit, 8 bits exponent, 23 bits mantissa
  • float64 Double-precision float: sign bit, 11 bits exponent, 52 bits mantissa
  • complex_ Shorthand for complex128
  • complex64 Complex number, represented by two 32-bit floats
  • complex128 Complex number, represented by two 64-bit floats

The Basics of NumPy Arrays

NumPy Array Attributes

  • ndarray.shape
    • the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the number of axes, ndim.
  • ndarray.size
    • the total number of elements of the array.
  • ndarray.dtype
    • an object describing the type of the elements in the array
  • ndarray.itemsize
    • the size in bytes of each element of the array.
  • ndarray.data
    • the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute

Array Indexing: Accessing Single Elements

Subarrays as no-copy views
  • One important—and extremely useful—thing to know about array slices is that they return views rather than copies of the array data
Creating copies of arrays
  • This can be most easily done with the copy()

Reshaping of Arrays

  • numpy.array().reshape(shape)
  • newaxis keyword within a slice operation:
    In[39]:x = np.array([1, 2, 3])
        # row vector via reshape
        x.reshape((1, 3))
    Out[39]: array([[1, 2, 3]])
    In[40]:# row vector via newaxis
        x[np.newaxis, :]
    Out[40]: array([[1, 2, 3]])
    

Array Concatenation and Splitting

Concatenation of arrays
  • Concatenation, or joining of two arrays in NumPy, is primarily accomplished through the routines
    1. np.concatenate
      • takes a tuple or list of arrays as its first argument
      • can also be used for two-dimensional arrays
    2. np.vstack
      In[48]: x = np.array([1, 2, 3])
              grid = np.array([[9, 8, 7],
              [6, 5, 4]])
              # vertically stack the arrays
              np.vstack([x, grid])
      Out[48]: array([[1, 2, 3],
              [9, 8, 7],
              [6, 5, 4]])
      
    3. np.hstack
      In[49]: # horizontally stack the arrays
          y = np.array([[99],
                      [99]])
          np.hstack([grid, y])
      Out[49]: array([[ 9, 8, 7, 99],
                  [ 6, 5, 4, 99]])
      
    4. Similarly, np.dstack will stack arrays along the third axis.
Splitting of arrays
  1. np.split, np.hsplit, np.vsplit, and np.dsplit. For each of these, we can pass a list of indices giving the split points:
    • Notice that N split points lead to N + 1 subarrays

Computation on NumPy Arrays: Universal Functions

  • The key to making it fast is to use vectorized operations, generally implemented through NumPy’s universal functions (ufuncs)

The Slowness of Loops

  • the bottleneck is the type-checking and function dispatches that CPython must do at each cycle of the loop
  • vectorized operation:
    For many types of operations, NumPy provides a convenient interface into just this kind of statically typed, compiled routine. This is known as a vectorized operation.

    • designed to push the loop into the compiled layer that underlies NumPy, leading to much faster execution.
  • 直接同时操作ndarray中所有元素的运算

  • + np.add

  • - np.subtract Subtraction (e.g., 3 – 2 = 1)
  • - np.negative Unary negation (e.g., -2)
  • * np.multiply
  • / np.divide
  • // np.floor_divide
  • ** np.power
  • % np.mod
Absolute value
  • np.absolute()
  • np.abs()
Trigonometric functions
  • np.sin()
  • np.cos()
  • np.tan()
  • np.arcsin()
  • np.arccos()
  • np.arctan()
Exponents and logarithms
  • np.exp(x)
  • np.exp2(x) $2^{x}$
  • np.power()
  • np.log()
  • np.log2()
  • np.log10()