Reaction Chemistry Database
review of an earlier version of The Chemical Thesaurus appeared in 2001
that perfectly sums up everything that The Chemical Thesaurus reaction
chemistry database project is trying to achieve:
Chemical Thesaurus is a reaction chemistry information system that
extends traditional references by providing hyperlinks between related
information. The program goes a long way toward meeting its ambitious
goal of creating a nonlinear reference for reaction information.
With its built-in connections, organizing themes, and multiple ways
to sort and view data, The Chemical Thesaurus is much greater than
the sum of the data in its database.
"The program does an excellent job of removing the artificial barriers between different subdisciplinary areas of chemistry by presenting a unified vision of inorganic and organic reaction chemistry."
K.R. Cousins, JACS, 123, 35, pp 8645-6 (2001)
OK, tell me more... Why is the software called "The Chemical Thesaurus"?
The word thesaurus means
storehouse, and The Chemical Thesaurus a
storehouse of information about chemical species [entities] and
chemical reactions, interactions and processes.
Also, the application behaves
rather like the thesaurus built into our word processors that allows us
to jump from word-to-word by meaning:
Thesaurus allows us to
jump from chemical to chemical via the associated interactions, reactions
and processes. For example, it is possible to click thru the industrial
synthesis of nylon-6,6:
And, with the Chemical Thesaurus it is possible to click
back to find out how nylon-6,6 is made.
|The Chemical Thesaurus is a reaction chemistry explorer
do I use the software?
Just click around... Explore... Discover...
That said, this
website is designed for people with an interest in chemistry and The Chemical Thesaurus may be a bit perplexing to non-scientists.
Briefly, The Chemical Thesaurus
reaction chemistry database & web application consists of just seven inter-linked screens:
List Chemical Entities (plus an associated/expanded
sorts & finds page)
Chemical Entities data page
List Interactions, Reactions & Processes
Interactions, Reactions & Processes data page
List Mechanisms & Collections
Mechanisms & Collections data page
The thing to remember is that a particular interaction, reaction or process can either be found by searching for the chemical entities that partake as substrates, reagents, solvents, catalysts, products, by-products, or by how the interaction, reaction or process is classified by mechanism or collection.
Chemistry: The Study of Matter and its Changes
Chemistry is often described
as the study of matter and its changes. This is crucial because
the relational database schema that under lies The Chemical Thesaurus
the very architecture of the application is explicitly designed
in terms of matter and the changes that occur to matter.
Matter is considered in
terms of chemical entities.
Changes to matter are
considered in terms of the interactions, reactions and/or processes
of defined chemical entities.
The term chemical
entity is used because it is inclusive and can be used to
group together all objects of chemical interest including: atoms, isotopes,
molecular substances & discrete molecules, photons, metals, alloys,
ionic salts, network materials,electrons, ions, radicals, reactive intermediates,
generic species such as nucleophile, and even specialist apparatus like
the Dean & Stark trap.
No other term is
The sodium ion,
Na+, is a chemical species but not a substance or a material.
Diamond is a material and it is a substance, but not a species.
Aldehydes and nucleophiles are hypothetical, generic objects.
The Dean & Stark trap is glassware.
of matter into the various types of chemical entity used in The Chemical
Thesaurus is discussed in detail in the Chemogenesis web book, here.
A particular chemical
entity may have one name or several synonyms. For example the compound
CH3I is commonly called both methyl iodide and
iodomethane, and both names appear in the synonyms database.
All chemical changes can
be described by chemical equations:
- 2 H2 + O2 > 2 H2O
- crude oil > methane, propane, butane...
- A + B > C
The reaction equation a powerful metaphor able to describe processes from elementary particle interactions to biochemistry.
Reaction equations can be balanced in terms of numbers of entities, mass, enthalpy, entropy and Gibbs free energy, or they may be unbalanced.
Hypothetical interactions and processes can be described.
Both physical changes and chemical changes can be modelled by chemical reaction equations.
Actually, there is no theoretical or clear-cut separation between "physical" and "chemical" change, although the distinction may sometimes be useful with beginning science students. Technically, all material changes are changes in phase space.
The RDMS Engine
The Chemical Thesaurus reaction
chemistry database on the web uses the MySQL
relational database engine to serve several relationally linked database
tables. There are several entry points into the table system:
What Data is Included?
It is not possible to add all
chemistry to any one database, so the decision has been made to fill The
Chemical Thesaurus reaction chemistry database with as much simple, fundamental
and important reaction chemistry as possible. The policy has been to scatter
wide rather than to pile deep. That said, it is hoped that
the database now contains all of the reaction chemistry knowledge that
a chemistry major would be expected to be familiar with, specialist
Although the database contains
some references to the primary and secondary literature the data is mainly
textbook level. But this should be seen as a strength rather than a weakness
because The Chemical Thesaurus reaction chemistry database, in tandem
with the Chemogenesis
web book, attempts to describe and model and reaction chemistry space,
from the ground up. Currently, the Chemical Thesaurus reaction chemistry
database holds information on:
- quarks, leptons & selected hadrons
- the proton, neutron & electrons
- isotopes, atoms, atomic ions
- nucleosynthesis & radioactive decay series
- simple molecules & molecular ions
- VSEPR geometries
- main group chemistry
- inorganic industrial chemistry
- organic industrial chemistry
- organic functional groups & FG reaction chemistry
- reaction mechanisms
- Lewis acids, Lewis bases & Lewis acid/base complexes
- redox agents, radicals, diradicals, photochemistry
- pericyclic processes
- Brønsted acids & conjugate bases
- material types, polymers, minerals, alloys
- explosives, flame chemistry
- selected natural products, common pharmaceuticals & their classes
- and more...
The Chemical Thesaurus holds
sample data on:
- organic chemistry of real species
- synthetic routes
- transition metal chemistry
- organometallic chemistry
These are truly vast areas
of human knowledge and comprehensive coverage is totally outside the scope
of the current iteration of this project. Detailed
information about the chemistry of real species is held in the primary
literature, a resource that consists of more than a hundred scientific
journals, plus various academic and commercial chemistry databases:
(the Chemical Abstracting Service) has a database of 27
million substances (as of Jan 2006), and about 4000 more are added
per day, that is more than a million a year
- Beilstein: 9 million organic substances
2 million inorganic compounds
What are all these "(generic)" entities?
Chemistry is commonly discussed
in terms of hypothetical species with ideal behaviour, with real species
assigned to these ideal, generic species. Consider the statement:
propanal are aldehydes."
Acetaldehyde and propanal
are real chemical entities, while the hypothetical aldehyde is
an idealised generic species.
The term 'Markush structure
or group' is sometimes used for generic, particuarly in the patent
This logic is formalised and
developed in The Chemical Thesaurus. This is possible because the reaction
chemistry database can hold information about any type of chemical object:
and generic species such as: aldehyde (generic)
Moreover, the software
allows the user to jump between real species and their associated generic
For example, acetic acid is a carboxylic acid and clicking on the Carboxylic acid (generic) link will jump to a page where all of the carboxylic acids in the database are listed.
Don't worry, it is much easier
to do with a click of the mouse than it is to to explain in words! But
you may have been wondering what all the references to "generic" were. Generic species
are always listed with (generic) after the name to avoid
A great deal of
chemical education involves understanding the chemistry of generic species,
and learning how to assign real species as generic species with each
other. This approach is integral to how The Chemical Thesaurus is organised.
Test your knowledge
by going the Chemistry Tutorials & Drills web site.
Retro Synthetic Analysis
Analysis (RSA) is a technique
employed in advanced synthetic organic chemistry
to help design the sequence of reactions to a large, multifunctional molecule
entity, such as a natural product or pharmaceutical agent.
is to logically find the synthetic building blocks required for construction
achieved by looking
for strategic bonds and the potential functional group inter-conversions
in a molecule, and
then to deducing the synthetic entities, or "synthons", required
to construct the desired molecule in the lab.
acetic anhydride can be disconnected onto an "acetyl cation
synthon" and an "acetate ion synthon":
There is no actual
reaction in which an acetyl cation reacts with an acetate anion, because
both ions require counter ions, however, the RSA analysis is conceptually
RSA deconstruction logic has
been extended in The Chemical Thesaurus to main group chemistry. For example,
the trivial Na+ plus Cl reaction to give sodium chloride is shown
as a retro synthetic disconnection:
Chemical Naming & Identification Issues
Please note that even simple
chemistry can generate naming problems. For example:
The chemistry associated
with elemental sulfur is commonly associated with S as it is
here but the species S does not exist, at least not below 1000°C.
Flowers of sulfur,
the common yellow soft crystalline form of the element, is S8.
If this species were to be used in the reaction chemistry database
all stoichiometries would have to be multiplied by 8 and the numbers
would become unnecessarily cumbersome.
The species S1 is invented for the sake of simplicity.
Likewise, there are two types
of proton, H+, in the database:
The proton of
high energy physics: H+(vacuum).
The proton associated
with Brønsted acid reaction chemistry: H+(solvated).
A decision has
been made to have separate entries for these two types of proton.
A decision has also been made
to have separate entries for minerals and reagent chemicals.
The reason is that
few minerals are chemically pure and chemists like composition to be
defined within 1% or better. The decision to separate minerals from
chemical reagents leads occasional double entries, such as two entries
for gypsum: gypsum the mineral of variable composition and gypsum the
pure chemical reagent.
Another problem results from
the usual conventions of writing chemical equations: reaction products
and by-products are expressed as pure materials even though they seldom
For example, an
aqueous industrial manufacturing process may produce sulfuric acid in
water as a by-product. Clicking on the sulfuric acid icon will transport
the user to the concentrated sulfuric acid data page, yet it
is not possible (with any energetically efficiency, at least) to convert
aqueous sulfuric acid into concentrated sulfuric acid.
chemical intelligence is required when navigating through the relationally
linked database tables.
Queries, Suggestions, Bugs, Errors, Typos...
If you have any:
Suggestions for links or future developments
Bug, typo or grammatical or factual error reports on this page or site,
contact Mark R. Leach, the author, using firstname.lastname@example.org
This free, open
access web resource is an ongoing project and your input is
© Mark R. Leach