Intro to Python Modules and Packages


Intro to Python Modules and Packages

Modular programming refers to the process of breaking a large, unwieldy programming task into separate, smaller, more manageable subtasks or modules. Individual modules can then be cobbled together like building blocks to create a larger application.

Functions, modules and packages are all constructs in Python that promote code modularization.

Python Modules: Overview

There are actually three different ways to define a module in Python:

  • A module can be written in Python itself.
  • A module can be written in C and loaded dynamically at run-time, like the re (regular expression) module.
  • A built-in module is intrinsically contained in the interpreter, like the itertools module.

A module’s contents are accessed the same way in all three cases: with the import statement.

The Module Search Path

Continuing with the above example, let’s take a look at what happens when Python executes the statement:import mod

When the interpreter executes the above import statement, it searches for mod.pyin a list of directories assembled from the following sources:

  • The directory from which the input script was run or the current directory if the interpreter is being run interactively
  • The list of directories contained in the PYTHONPATH environment variable, if it is set. (The format for PYTHONPATH is OS-dependent but should mimic the PATH environment variable.)
  • An installation-dependent list of directories configured at the time Python is installed

The import Statement

Module contents are made available to the caller with the import statement. The import statement takes many different forms, shown below.

The statement import <module_name>only places <module_name> in the caller’s symbol table. The objects that are defined in the module remain in the module’s private symbol table.

From the caller, objects in the module are only accessible when prefixed with <module_name> via dot notation, as illustrated below.

An alternate form of the import statement allows individual objects from the module to be imported directly into the caller’s symbol table:

from <module_name> import <name(s)>

It is also possible to import individual objects but enter them into the local symbol table with alternate names:

from <module_name> import <name> as <alt_name>[, <name> as <alt_name> …]

You can also import an entire module under an alternate name:

import <module_name> as <alt_name>

The dir() Function

The built-in function dir()returns a list of defined names in a namespace. Without arguments, it produces an alphabetically sorted list of names in the current local symbol table.

Executing a Module as a Script

Any .py file that contains a module is essentially also a Python script, and there isn’t any reason it can’t be executed like one.

Wouldn’t it be nice if you could distinguish between when the file is loaded as a module and when it is run as a standalone script?

Ask and ye shall receive.

When a .py file is imported as a module, Python sets the special dunder variable __name__to the name of the module. However, if a file is run as a standalone script, __name__ is (creatively) set to the string '__main__'. Using this fact, you can discern which is the case at run-time and alter behavior accordingly.

s = "If Comrade Napoleon says it, it must be right."
a = [100, 200, 300]

def foo(arg):
    print(f'arg = {arg}')

class Foo:
    pass

if (__name__ == '__main__'):
    print('Executing as standalone script')
    print(s)
    print(a)
    foo('quux')
    x = Foo()
    print(x)

Now, if you run as a script, you get output, But if you import as a module, you don’t.

Reloading a Module

For reasons of efficiency, a module is only loaded once per interpreter session. That is fine for function and class definitions, which typically make up the bulk of a module’s contents. But a module can contain executable statements as well, usually for initialization. Be aware that these statements will only be executed the first time a module is imported.

Python Packages

Packages allow for a hierarchical structuring of the module namespace using dot notation. In the same way that modules help avoid collisions between global variable names, packages help avoid collisions between module names.

if the pkg directory resides in a location where it can be found (in one of the directories contained in sys.path), you can refer to the two modules with dot notation (pkg.mod1, pkg.mod2) and import them with the syntax you are already familiar with:

import <module_name>[, <module_name> ...]

from pkg.mod1 import foo

Package Initialization

If a file named __init__.py is present in a package directory, it is invoked when the package or a module in the package is imported. This can be used for execution of package initialization code, such as initialization of package-level data.

Importing * From a Package

Python follows this convention: if the __init__.py file in the package directory contains a list named __all__, it is taken to be a list of modules that should be imported when the statement from <package_name> import * is encountered.

from <package_name> import *

Subpackages

Packages can contain nested subpackages to arbitrary depth. For example, let’s make one more modification to the example package directory as follows: img


pytest: helps you write better programs

The pytest framework makes it easy to write small tests, yet scales to support complex functional testing for applications and libraries.

An example of a simple test:

# content of test_sample.py
def inc(x):
    return x + 1


def test_answer():
    assert inc(3) == 5

then just run pytest to excute it.

Why use PyTest?

Some of the advantages of pytest are

  • Very easy to start with because of its simple and easy syntax.
  • Can run tests in parallel.
  • Can run a specific test or a subset of tests
  • Automatically detect tests
  • Skip tests
  • Open source

Assertions in PyTest

Assertions are checks that return either True or False status. In pytest, if an assertion fails in a test method, then that method execution is stopped there. The remaining code in that test method is not executed, and pytest will continue with the next test method.

How pytest identifies the test files and test methods

By default pytest only identifies the file names starting with test_ or ending with _test as the test files. We can explicitly mention other filenames though (explained later). Pytest requires the test method names to start with “test.” All other method names will be ignored even if we explicitly ask to run those methods.

Run multiple tests from a specific file and multiple files.

Currently, inside the folder study_pytest, we have a file test_sample1.py. Suppose we have multiple files , say test_sample2.py , test_sample3.py. To run all the tests from all the files in the folder and subfolders we need to just run the pytest command.

To run tests only from a specific file, we can use py.test <filename>

Run a subset of entire test

Sometimes we don’t want to run the entire test suite. Pytest allows us to run specific tests. We can do it in 2 ways

  • Grouping of test names by substring matching
  • Grouping of tests by markers

if we have:

  • test_sample1.py
    • test_file1_method1()
    • test_file1_method2()
  • test_sample2.py
    • test_file2_method1()
    • test_file2_method2()

Option 1) Run tests by substring matching

Here to run all the tests having method1 in its name we have to run

py.test -k method1 -v

-k <expression> is used to represent the substring to match -v increases the verbosity

Option 2) Run tests by markers

Pytest allows us to set various attributes for the test methods using pytest markers, @pytest.mark . To use markers in the test file, we need to import pytest on the test files.

Here we will apply different marker names to test methods and run specific tests based on marker names. We can define the markers on each test names by using @pytest.mark.<name>.

We can run the marked test by py.test -m <name> -m <name> mentions the marker name

Running tests in parallel

Usually, a test suite will have multiple test files and hundreds of test methods which will take a considerable amount of time to execute. Pytest allows us to run tests in parallel.

For that we need to first install pytest-xdist

You can run tests now by

py.test -n 4