Reference¶
Commands¶
warn-transformer¶
Consolidate, enrich and republish the data gathered by warn-scraper.
warn-transformer [OPTIONS] COMMAND [ARGS]...
consolidate¶
Consolidate raw data using a common data schema.
warn-transformer consolidate [OPTIONS]
Options
- --input-dir <input_dir>¶
The Path were the raw files results are located
- --source <source>¶
The source to download. Default is all sources.
- -l, --log-level <log_level>¶
Set the logging level
- Options:
DEBUG | INFO | WARNING | ERROR | CRITICAL
download¶
Download all the CSVs in the WARN Notice project on biglocalnews.org.
warn-transformer download [OPTIONS]
Options
- --download-dir <download_dir>¶
The Path were the results will be downloaded
- --source <source>¶
The source to download. Default is all sources.
- -l, --log-level <log_level>¶
Set the logging level
- Options:
DEBUG | INFO | WARNING | ERROR | CRITICAL
integrate¶
Integrate the latest consolidated data with the current database.
warn-transformer integrate [OPTIONS]
Options
- --input-dir <input_dir>¶
The Path were the new results are located
- --init¶
- -l, --log-level <log_level>¶
Set the logging level
- Options:
DEBUG | INFO | WARNING | ERROR | CRITICAL
Schema¶
- class warn_transformer.schema.BaseTransformer(input_dir: Path)¶
Transform a state’s raw data for consolidation.
- check_if_amendment(row: Dict) bool ¶
Determine whether a row is an amendment or not.
- Parameters:
row (dict) – The raw row of data.
Returns: A boolean
- check_if_closure(row: Dict) Optional[bool] ¶
Determine whether a row is a closure or not.
- Parameters:
row (dict) – The raw row of data.
Returns: A boolean or null
- check_if_temporary(row: Dict) Optional[bool] ¶
Determine whether a row is a temporary or not.
- Parameters:
row (dict) – The raw row of data.
Returns: A boolean or null
- get_hash_id(data: Dict) str ¶
Convert the row into a unique hexdigest to use as a unique identifier.
- Parameters:
data (dict) – One raw row of data from the source
Returns: A unique hexdigest string computed from the source data.
- get_raw_data() List[Dict] ¶
Get the raw data from our scraper for this source.
Returns: A list of raw rows of data from the source.
- get_raw_value(row, method)¶
Fetch a value from the row that for transformation.
- Parameters:
row – One raw row of data from the source
method – The technique to use to pull data. If a strong method is provided, it is used to fetch a key of that name from the row. If a callable function is provided, the row is run through it.
Returns: A value ready for transformation.
- handle_amendments(row_list: List[Dict]) List[Dict] ¶
Remove amended filings from the provided list of records.
- Parameters:
row_list (list) – A list of clean rows of data.
Returns: A list of cleaned data, minus amended records.
- prep_row_list(row_list: List[Dict]) List[Dict] ¶
Make necessary transformations to the raw row list prior to transformation.
- Parameters:
row_list (list) – A list of raw rows of data from the source.
Returns: The row list minus empty records
- schema¶
alias of
WarnNoticeSchema
- transform() List[Dict] ¶
Transform prepared rows into a form that’s ready for consolidation.
Returns: A validated list of dictionaries that conform to our schema
- transform_company(value: str) str ¶
Transform a raw company name.
- Parameters:
value (str) – The raw company string provided by the source
Returns: A string object ready for consolidation.
- transform_date(value: str) Optional[str] ¶
Transform a raw date string into a date object.
- Parameters:
value (str) – The raw date string provided by the source
Returns: A date object ready for consolidation. Or, if the date string is invalid, a None.
- transform_jobs(value: str) Optional[int] ¶
Transform a raw jobs number into an integer.
- Parameters:
value (str) – A raw jobs number provided by the source
Returns: An integer number ready for consolidation. Or, if the value is invalid, a None.
- transform_location(value: str) str ¶
Transform a raw location.
- Parameters:
value (str) – The raw location string provided by the source
Returns: A string object ready for consolidation.
- transform_row(row: Dict) Dict ¶
Transform a row into a form that’s ready for consolidation.
- Parameters:
row (dict) – One raw row of data from the source
Returns: A transformed dict that’s ready to be loaded into our consolidated schema.
- class warn_transformer.schema.WarnNoticeSchema(*, only: types.StrSequenceOrSet | None = None, exclude: types.StrSequenceOrSet = (), many: bool = False, context: dict | None = None, load_only: types.StrSequenceOrSet = (), dump_only: types.StrSequenceOrSet = (), partial: bool | types.StrSequenceOrSet = False, unknown: str | None = None)¶
An standardized instance of a WARN Act Notice.
Utils¶
- warn_transformer.utils.get_all_transformers() List[str] ¶
Get all the states and territories that have scrapers.
Returns: List of lower-case post abbreviations.