pyrate.algorithms package

Submodules

pyrate.algorithms.aisparser module

Parses the AIS data from csv of xml files and populates the AIS database

pyrate.algorithms.aisparser.check_imo(imo)[source]
pyrate.algorithms.aisparser.float_or_null(s)[source]
pyrate.algorithms.aisparser.get_data_source(name)[source]

Guesses data source from file name.

If the name contains ‘terr’ then we guess terrestrial data, otherwise we assume satellite.

Parameters:name (str) – File name
Returns:0 if satellite, 1 if terrestrial
Return type:int
pyrate.algorithms.aisparser.imostr(s)[source]
pyrate.algorithms.aisparser.int_or_null(s)[source]
pyrate.algorithms.aisparser.longstr(s)[source]
pyrate.algorithms.aisparser.parse_file(fp, name, ext, baddata_logfile, cleanq, dirtyq, source=0)[source]

Parses a file containing AIS data, placing rows of data onto queues

Parameters:
  • fp (str) – Filepath of file to be parsed
  • name (str) – Name of file to be parsed
  • ext (str) – Extension, either ‘.csv’ or ‘.xml’
  • baddata_logfile (str) – Name of the logfile
  • cleanq – Queue for messages to be inserted into clean table
  • dirtyq – Queue for messages to be inserted into dirty table
  • source (int, optional, default=0) – 0 is satellite, 1 is terrestrial
Returns:

  • invalid_ctr (int) – Number of invalid rows
  • clean_ctr (int) – Number of clean rows
  • dirty_ctr (int) – Number of dirty rows
  • time_elapsed (time) – The time elapsed since starting the parse_file procedure

pyrate.algorithms.aisparser.parse_raw_row(row)[source]

Parse values from row, returning a new dict with converted values

Parse values from row, returning a new dict with converted values converted into appropriate types. Throw an exception to reject row

Parameters:row (dict) – A dictionary of headers and values from the csv file
Returns:converted_row – A dictionary of headers and values converted using the helper functions
Return type:dict
pyrate.algorithms.aisparser.parse_timestamp(s)[source]
pyrate.algorithms.aisparser.readcsv(fp)[source]

Returns a dictionary of the subset of columns required

Reads each line in CSV file, checks if all columns are available, and returns a dictionary of the subset of columns required (as per AIS_CSV_COLUMNS).

If row is invalid (too few columns), returns an empty dictionary.

Parameters:fp (str) – File path
Yields:rowsubset (dict) – A dictionary of the subset of columns as per columns
pyrate.algorithms.aisparser.readxml(fp)[source]
pyrate.algorithms.aisparser.run(inp, out, dropindices=True, source=0)[source]

Populate the AIS_Raw database with messages from the AIS csv files

Parameters:
  • inp (str) – The name of the repositor(-y/-ies) as defined in the global variable INPUTS
  • out (str) – The name of the repositor(-y/-ies) as defined in the global variable OUTPUTS
  • dropindices (bool, optional, default=True) – Drop indexes for faster insert
  • source (int, optional, default=0) – Indicates terrestrial (1) or satellite data (0)
pyrate.algorithms.aisparser.set_null_on_fail(row, col, test)[source]

Helper function which sets the column in a row of data to null on fail

Parameters:
  • row (dict) – A dictionary of the fields
  • col (str) – The column to check
  • test (func) – One of the validation functions in pyrate.utils
pyrate.algorithms.aisparser.validate_row(row)[source]
pyrate.algorithms.aisparser.xml_name_to_csv(name)[source]

Converts a tag name from an XML file to the corresponding name from CSV.

pyrate.algorithms.imolist module

pyrate.algorithms.imolist.create_imo_list(aisdb)[source]

Create the imo list table from MMSI, IMO pairs in clean and dirty tables.

This method collects the unique MMSI, IMO pairs from a table, and the time intervals over-which they have been seen in the data. These tuples are then upserted into the imo_list table.

Removes cases where ships have clashing MMSI numbers within a time threshold.

On the clean table pairs with no IMO number are also collected to get the activity intervals of MMSI numbers. On the dirty table only messages specifying an IMO are collected.

Parameters:aisdb (postgresdb) – The database upon which to operate
pyrate.algorithms.imolist.run(_, out)[source]

pyrate.algorithms.vesselimporter module

Extracts a subset of clean ships into ais_extended tables

pyrate.algorithms.vesselimporter.cluster_table(aisdb, table)[source]

Performs a clustering of the postgresql table on the MMSI index.

This process significantly improves the runtime of extended table generation.

pyrate.algorithms.vesselimporter.filter_good_ships(aisdb)[source]

Generate a set of imo numbers and (mmsi, imo) validity intervals

Generate a set of imo numbers and (mmsi, imo) validity intervals for ships which are deemed to be ‘clean’. A clean ship is defined as one which:

  • Has valid MMSI numbers associated with it.
  • For each MMSI number, the period of time it is associated with this IMO (via message number 5) overlaps with the period the MMSI number was in use.
  • For each MMSI number, its usage period does not overlap with that of any other of this ship’s MMSI numbers.
  • That none of these MMSI numbers have been used by another ship (i.e. another IMO number is also associated with this MMSI)
Returns:
  • valid_imos – A set of valid imo numbers
  • imo_mmsi_intervals – A list of (mmsi, imo, start, end) tuples, describing the validity intervals of each (mmsi, imo) pair
pyrate.algorithms.vesselimporter.generate_extended_table(aisdb, intervals, n_threads=2)[source]
pyrate.algorithms.vesselimporter.get_remaining_interval(aisdb, mmsi, imo, start, end)[source]
pyrate.algorithms.vesselimporter.insert_message_stream(aisdb, interval, msg_stream)[source]

Takes a stream of messages for an MMSI over an interval, runs it through outlier detection and interpolation algorithms, then inserts the resulting stream into the ais_extended table.

pyrate.algorithms.vesselimporter.interval_copier(db_options, interval_q)[source]
pyrate.algorithms.vesselimporter.process_interval_series(aisdb, interval)[source]
pyrate.algorithms.vesselimporter.run(inp, out, n_threads=2, dropindices=False)[source]
pyrate.algorithms.vesselimporter.upsert_interval_to_imolist(aisdb, mmsi, imo, start, end)[source]

Module contents