openspi.utils

Functions

count_files(folder_path)

Counts the number of files in a given directory.

reformat_path(path)

Reformat a raw string file path (path\to\folder) to contain double back-

identical_list_items(lst)

Checks if all the values in a given list are the same.

plastic_matches_checked(row, index)

Appends a note to a given list which marks the index of the subsequent

nonpolymer_matches_checked(row, nested_list)

Checks a given dataframe (in the form of a nested list) to determine if all

subsequent_matches_checked(df, nested_list)

Checks if the first match (highest r val) for each sample is plastic, and

save_df_to_excel(excel_path, df, sheetname)

Saves a Pandas dataframe to an Excel file. If the Excel file does not yet

check_excel_sheet(excel_path, sheetname)

Checks if an Excel spreadsheet exists within the (already existing) Excel

empty_wells_count(df)

Counts how many times 'empty well' appears in a given dataframe.

matches_checked_sheet(excel_path[, nrel, n])

Adds a 'Notes' sheet with the number of nonpolymer matches and empty wells

count_matches(df)

list_to_df_to_sheet(df_lst, columns_list, excel_path, ...)

Converts a nested list into a Pandas dataframe, which is then saved to an

Module Contents

openspi.utils.count_files(folder_path)

Counts the number of files in a given directory.

Parameters:

folder_path – The path to the directory.

Returns:

The number of files in the directory, or -1 if the directory does not exist.

openspi.utils.reformat_path(path)

Reformat a raw string file path (path\to\folder) to contain double back- slashes (path\\to\\folder), as is required by the R programming language. Doesn’t do anything if the path uses forward slashes (/)

Parameters:

path (str) – The path to be reformatted.

Returns:

reformatted_path – All are replaced with \.

Return type:

str

openspi.utils.identical_list_items(lst)

Checks if all the values in a given list are the same.

Parameters:

lst (list) – A list of values.

Returns:

Returns True if all the values are the same, False if not.

Return type:

bool

openspi.utils.plastic_matches_checked(row, index)

Appends a note to a given list which marks the index of the subsequent plastic match.

Parameters:
  • row (list) – A list containing data for one library match for one file.

  • index (int) – The index of the subsequent plastic match. (see subsequent_matches_checked)

Returns:

row – An updated version of the original row var. It now contains an additional item.

Return type:

list

openspi.utils.nonpolymer_matches_checked(row, nested_list)

Checks a given dataframe (in the form of a nested list) to determine if all matches are empty wells or not.

Parameters:
  • row (list) – A list containing data for one library match for one file.

  • nested_list (list) – A list of lists, where each list in the outer list represents a row.

Returns:

row – An updated version of the original``row`` var. It now contains an additional item.

Return type:

list

openspi.utils.subsequent_matches_checked(df, nested_list)

Checks if the first match (highest r val) for each sample is plastic, and if it is not, it checks the subsequent matches for any plastic matches, and if there are, it makes a note of its place (1st, 2nd, etc). If there are no plastic matches, it will check if all subsequent matches are for an empty well and make a note. If that is not the case, it will note that all subsequent matches are ‘nonpolymer.’

Parameters:
  • df (df) – A Pandas dataframe.

  • nested_list (list) – A list of lists, where each list in the outer list represents a row.

Returns:

nested_list – An updated version of the original``nested_list``. Each inner list is now one item longer.

Return type:

list

openspi.utils.save_df_to_excel(excel_path, df, sheetname)

Saves a Pandas dataframe to an Excel file. If the Excel file does not yet exist, it will create it and save the sheet.

Parameters:
  • excel_path (str) – The full path to an .xlsx file.

  • df (df) – The dataframe to be saved as an Excel sheet.

  • sheetname (str) – The name of the Excel sheet.

Return type:

None.

openspi.utils.check_excel_sheet(excel_path, sheetname)

Checks if an Excel spreadsheet exists within the (already existing) Excel workbook and creates one if not. Note that this differs from save_df_to_excel and is used for directly editing cells in an Excel spreadsheet.

Parameters:
  • excel_path (str) – The full path to an already existing .xlsx file.

  • sheetname (str) – The name of the Excel sheet.

Returns:

ws – An Excel sheet with the desired sheetname.

Return type:

openpyxl worksheet object

openspi.utils.empty_wells_count(df)

Counts how many times ‘empty well’ appears in a given dataframe.

Parameters:

df (df) – A Pandas dataframe.

Returns:

count – The number of times 'empty well' appears in the 'spectrum_identity' column.

Return type:

int

openspi.utils.matches_checked_sheet(excel_path, nrel=False, n=5)

Adds a ‘Notes’ sheet with the number of nonpolymer matches and empty wells

Parameters:
  • excel_path (str) – The full path to an .xlsx file.

  • nrel (bool) – If True, the function will also check polymer count of the first well individually.

  • n (int) – The number of top matches for each file. Equal to top_n in openspi_main. Default is 5.

Return type:

None.

openspi.utils.count_matches(df)
openspi.utils.list_to_df_to_sheet(df_lst, columns_list, excel_path, sheet_name)

Converts a nested list into a Pandas dataframe, which is then saved to an Excel worksheet.

Parameters:
  • excel_path (str) – The full path to an .xlsx file.

  • df_lst (list) – The nested list.

  • columns_list (list) – The list of column names.

  • sheet_name (str) – The name of the Excel sheet.

Return type:

None.