Reference

Sections

Commands

warn-transformer

Consolidate, enrich and republish the data gathered by warn-scraper.

warn-transformer [OPTIONS] COMMAND [ARGS]...

consolidate

Consolidate raw data using a common data schema.

warn-transformer consolidate [OPTIONS]

Options

--input-dir <input_dir>

The Path were the raw files results are located

--source <source>

The source to download. Default is all sources.

-l, --log-level <log_level>

Set the logging level

Options

DEBUG | INFO | WARNING | ERROR | CRITICAL

download

Download all the CSVs in the WARN Notice project on biglocalnews.org.

warn-transformer download [OPTIONS]

Options

--download-dir <download_dir>

The Path were the results will be downloaded

--source <source>

The source to download. Default is all sources.

-l, --log-level <log_level>

Set the logging level

Options

DEBUG | INFO | WARNING | ERROR | CRITICAL

integrate

Integrate the latest consolidated data with the current database.

warn-transformer integrate [OPTIONS]

Options

--input-dir <input_dir>

The Path were the new results are located

--init
-l, --log-level <log_level>

Set the logging level

Options

DEBUG | INFO | WARNING | ERROR | CRITICAL

Schema

class warn_transformer.schema.BaseTransformer(input_dir: Path)

Transform a state’s raw data for consolidation.

check_if_amendment(row: Dict) bool

Determine whether a row is an amendment or not.

Parameters

row (dict) – The raw row of data.

Returns: A boolean

check_if_closure(row: Dict) Optional[bool]

Determine whether a row is a closure or not.

Parameters

row (dict) – The raw row of data.

Returns: A boolean or null

check_if_temporary(row: Dict) Optional[bool]

Determine whether a row is a temporary or not.

Parameters

row (dict) – The raw row of data.

Returns: A boolean or null

get_hash_id(dict: Dict) str

Convert the row into a unique hexdigest to use as a unique identifier.

Parameters

dict (dict) – One raw row of data from the source

Returns: A unique hexdigest string computed from the source data.

get_raw_data() List[Dict]

Get the raw data from our scraper for this source.

Returns: A list of raw rows of data from the source.

get_raw_value(row, method)

Fetch a value from the row that for transformation.

Parameters
  • row – One raw row of data from the source

  • method – The technique to use to pull data. If a strong method is provided, it is used to fetch a key of that name from the row. If a callable function is provided, the row is run through it.

Returns: A value ready for transformation.

handle_amendments(row_list: List[Dict]) List[Dict]

Remove amended filings from the provided list of records.

Parameters

row_list (list) – A list of clean rows of data.

Returns: A list of cleaned data, minus amended records.

prep_row_list(row_list: List[Dict]) List[Dict]

Make necessary transformations to the raw row list prior to transformation.

Parameters

row_list (list) – A list of raw rows of data from the source.

Returns: The row list minus empty records

schema

alias of WarnNoticeSchema

transform() List[Dict]

Transform prepared rows into a form that’s ready for consolidation.

Returns: A validated list of dictionaries that conform to our schema

transform_company(value: str) str

Transform a raw company name.

Parameters

value (str) – The raw company string provided by the source

Returns: A string object ready for consolidation.

transform_date(value: str) Optional[str]

Transform a raw date string into a date object.

Parameters

value (str) – The raw date string provided by the source

Returns: A date object ready for consolidation. Or, if the date string is invalid, a None.

transform_jobs(value: str) Optional[int]

Transform a raw jobs number into an integer.

Parameters

value (str) – A raw jobs number provided by the source

Returns: An integer number ready for consolidation. Or, if the value is invalid, a None.

transform_location(value: str) str

Transform a raw location.

Parameters

value (str) – The raw location string provided by the source

Returns: A string object ready for consolidation.

transform_row(row: Dict) Dict

Transform a row into a form that’s ready for consolidation.

Parameters

row (dict) – One raw row of data from the source

Returns: A transformed dict that’s ready to be loaded into our consolidated schema.

class warn_transformer.schema.WarnNoticeSchema(*, only: types.StrSequenceOrSet | None = None, exclude: types.StrSequenceOrSet = (), many: bool = False, context: dict | None = None, load_only: types.StrSequenceOrSet = (), dump_only: types.StrSequenceOrSet = (), partial: bool | types.StrSequenceOrSet = False, unknown: str | None = None)

An standardized instance of a WARN Act Notice.

Utils

warn_transformer.utils.get_all_transformers() List[str]

Get all the states and territories that have scrapers.

Returns: List of lower-case post abbreviations.