replicate_regression_omics(data_file, foptions_file, base_directory) Bayesian replicate regression for omics data FUNCTION ARGUMENTS: data_file: omics data file (full directory path) foptions_file: table file containing the options (full directory path) base_directory: directory name for results (full directory path) goptions_file (optional): table file containing the options for single protein graphics (full directory path) OUTPUT: Data and graphics are written to files -------------------------------------------------------------------------------- Format of data file (tab-separated text, see examples): Line 1: Headers Headers of protein names columns (e.g., !BSUnumber, !BGnumber, !GeneName, !UniprotID), followed by sample names (as headers of data columns) Line 2: Time points first column: !Time data columns: time points (numbers) Line 3: Replicate numbers first column: !Replicate data columns: replicate names Line 4 (optional): !ValueType ('Value', 'Mean', or 'Std') Further lines: numerical data -------------------------------------------------------------------------------- Format of options file (tab-separated text, see examples): Each line contains one attribute: first column: attribute name second column: attribute value (string or number) all further columns are ignored Lines starting with the '%' character are ignored (can be used for comments) The attribute 'options_file' allows to declare another options file containing default options The attribute 'data_file_csv' contains the name of the data file -------------------------------------------------------------------------------- Attributes in options file (for replicate_regression_omics_analysis) data_dir directory name for data files result_dir directory name for result files graphics_dir directory name for graphics data_file_csv filename for data file (tsv format, see examples) data_file_matlab filename for matlab data file (written during the analysis) options_file filename for default options file (tsv format) options_out_csv filename for completed options file (tsv format, written during analysis) translation_table_file filename for ID mapping table (see example) result_file_matlab filename for result_file_csv filename for hahne_salt_stress_cytosol_result.tsv result_file_zip filename for hahne_salt_stress_result.zip graphics_file file basename for graphics data_time_unit time unit ('min') data_scale 'absolute' or 'log2' (also 'ln','log','log10','log2 ratio'; these are all treated like 'log2'); data_min_num_replicates minimal number of valid replicates (genes with less valid replicates are discarded; default 1) For "absolute" data: abs_data_adjust_std_upper upper threshold; points above are outliers (increase std dev by factor of 3) abs_data_adjust_std_lower lower threshold; points below are outliers (increase std dev by factor of 3) data_std_relative default for relative standard deviation data_std_minimal minimal standard deviation For logarithmic data ( 'log2', 'ln','log','log10','log2 ratio') data_std_log default for standard deviation (on log scale) log_data_adjust_std_threshold threshold for data values (on chosen log scale) for which std dev is modified (criterion: | [data value] - [median for this gene & replicate] | > threshold ) for the inserted std dev, see next entry log_data_adjust_std_factor new std width = factor * absolute deviation from median data_min_data_points minimal number of data points required in the analysis (default 3) at least one replicate has to reach this number, points are times t<0 do not count replicates with less data points are ignored convert_to_logarithm convert (nonlogarithmic) data to logarithms for replicate regression (Boolean) log_transformation type of transformation 'arithmetic': data=mean values and plotting on absolute scale 'geometric' : data = median values and plotting on log scale (but data on absolute scale) ignore_std_deviations Boolean, ignore standard deviations given in data basis string: type of basis functions fixed_prior keeping the prior fixed? (Boolean, default 0) prior_updating number of prior updating iterations (default 10) updating_factor prior updating factor, default 1.2 updating_factor_final prior updating factor before last regression; default [] (not set) update_prior_means change parameter means from 0 to posterior means while updating? default 0 t_smooth time constant defining how prior widths depend on the frequency options_start_value fixed starting value -> to be inserted into options as options.start_value options_start_at_t Starting time point for changes (after constant behaviour) to be inserted into options as options.start_at_t options_constant_before_start Boolean (keep curves constant before starting time) to be inserted into options as options.constant_before_start regression_t_interp time points for regression (optional) regression_tmin start time for regression (optional) regression_tmax end time for regression (optional) crossvalidation run crossvalidation? (Boolean, default 0) postprocess_normalise Boolean, default 1 graphics_individual file basename (used in script replicate_regression_omics_selected' graphics_scale default 'log2', 'linear' graphics_format 'eps', 'png' (for technical reasons, 'eps' needs to be written in single quotes) convenience_name type of protein names to be used in graphics (default 'SubtiWiki_20090701') normalise_by_median (Boolean, default TRUE) mark_outliers_percentage percentage of data points to be marked as outliers based on crossvalidation error Additional attributes in options file for individual graphics (function 'replicate_regression_omics_selected') graphics_scale 'log2','linear' postprocess_normalise 1 element_id id (or list, selected by |) element_name name (or list, selected by |) delimiter_symbol symbol for delimiting list of elements (in element_id, element_name) title_string title for graphics x_label x label for graphics y_label y label for graphics plot_data produce plot for data (single element) plot_replicates produce plot with replicates (single element) plot_regression produce plot for regression curves (single element) plot_all produce joint plots for all elements