Map metabolite identifiers

Get the list of network metabolites to map

To map your dataset metabolites, you need to import in MetExplore the list of corresponding identifiers specific to the BioSource you are working on. This list can be extracted from the “Metabolites” table obtained using “metabolite identifier matcher” (as we did in the previous section). To extract this list of network identifiers corresponding to the HE dataset metabolites.

  • Open the “Metabolites” sheet of the xlsx file obtained from the “metabolite identifier matcher”

  • Filter in the table to display only the rows with matching dataset metabolites: select the header row of the table, and then click “Filter” in the excel “Data” tab.

  • Drop-down arrows will appear in the header of each column.

  • Click the drop-down arrow of the “dataset_name” column → the Filter menu appears.

  • Uncheck the box next to “empty” → the table is filtered to display only the rows that have a non-empty “dat- aset_name” cell. You should get a list of 95 metabolites.

Fuzzy search aims at finding names even if there are some spelling mistakes, inversions, special letters…etc.

Use Fuzzy search to find the metabolite missing in the mapping: “Indolelactic acid” and add the corresponding identifiers in the list of metabolites to map.

Fuzzy search indoleactic acid (click me!)

Fuzzy search

Fuzzy search indoleactic acid

Manually filter the metabolites to map

Note that the list of BioSource identifiers obtained using the “metabolite identifier matcher” often needs to be reformatted for the purpose of the mapping, to deal with some ambiguous cases. Indeed, you noticed that you have a higher number of mapped metabolites (96) than the initial number of metabolites in the HE signature. This is due to 2 main reasons:

First, since the metabolic network includes the different cellular compartments, input metabolites can be mapped on different compartments. For example, the methionine (M_met_L) is found in the cytosol (M_met_L_c), the mitochondria (M_met_L_m), the lysosome (M_met_L_l), the extracellular space (M_met_L_e) and a “boundary” compartment (M_met_L_b).

  • The “extracellular space” (e) and “boundary” (b) compartments include metabolites that can be exchanged by the cell or organism with its environment (e.g., metabolites that are produced or consumed by the cell to or from the cell culture medium). These compartments are added in the metabolic network to account for the exchange reactions with the environment, but these metabolites are actually not measured when performing intracellular metabolomics analyses, and therefore should be removed from the list.
  • Other compartments correspond to intracellular compartments (Cytosol, mitochondria …). Because there are no information about the cellular localization of the identified metabolites in the data, it is necessary to keep all the metabolites mapped in these intracellular compartments.
  • Remove “*_b” and “*_e” metabolites as well as “M_Rtotal” metabolite from initial list

  • Copy / paste the displayed list with only “BioSource_Identifier” and “average distance” columns in another excel file or tab.

Filter metabolites to map Filter metabolites to map in Excel (screenshot from LibreOffice but it must look like this)

A “cleaned” list of BioSource metabolite identifiers to be used for mapping is provided here.

This final list contains 63 distinct network metabolites.

We will use the columns “BioSource_Identifier” and “average_distance” (which corresponds to the matching distance between the metabolites from the HE dataset and the network metabolites).

Perform mapping in MetExplore

  • Select “Omics” - “Mapping” - “From Omics” in the menu at the top of the page: this will open a “Mapping” window
  • Check the “Consider first row as header of columns” box if you have copied the first line containing the columns headers.
  • Enter a “Mapping name”
  • Copy the data from the Excel file (select only the columns “BioSource_Identifier” and “average_distance”) and paste it directly in the mapping grid, using Ctrl+V.

Note that the first column of your input data must be the type of identifiers that will be used to perform the mapping. Following columns are facultative and correspond to numeric values in different conditions. In our case, the condition is the average matching distance.

How to do mapping in MetExplore? (click me!)

Mapping

Import data metabolites for mapping in MetExplore

Mapping can be performed for all biological objects stored in a metabolic network (pathways, reactions, metabolites, enzymes, gene products and genes).

  • Select “Metabolite” as the type of biological objects you want to map in the “Object” menu

  • Select “Identifier” as feature, to indicate that you are providing BioSource identifiers. 

  • Click on the “Map” button

Results of the mapping in MetExplore (click-me!)

Mapping Result

Import data metabolites for mapping in MetExplore

Once the mapping has been achieved, a new column “Identified” is displayed, indicating, for each input metabolite, whether it has been found in the network (true) or not (false). Some statistics are also displayed:

  • “Nb. Data”: the initial number of metabolites in the input dataset;
  • “Nb mapped”: the number of input metabolites that have been successfully mapped in the network;
  • “Nb. Data in the network”: the number of corresponding network metabolites; Note that these two last numbers might differ if two input metabolites map on the same network metabolite for instance.

The mapping can be exported using the “Save Mapping in File” button in the mapping window: it will be exported as a json file and can be later imported back in MetExplore using the “Import” menu in the top bar, and selecting “Import Mapping from file”.