TemaSearch for HTML Forms

Overview

All that is required to add TemaSearch to an existing search form, is a few changes to the form's HTML code. The changes are straightforward, though they require a basic understanding of HTML, in particular HTML forms.

Before you begin, you will need

Technical Requirements

TemaSearch has a few technical requirements that your search engine must meet:

  1. The form must use the GET method for submitting to the server.

    If, instead, your current form uses POST, you may still be able to use TemaSearch, as search engines that use POST often support GET as well. To see if your search engine supports GET, change the form's method to GET, and try it out. If it works as normal, you can continue to add TemaSearch to your site. If it does not work, contact us and we can discuss the alternative solutions available.

  2. Your search engine can search for more than one word at a time, and can return documents containing any of the words (it's not necessary for documents to contain all words.) This is usually expressed using "OR", although your search engine may use some other syntax. If the query syntax used by your search engine is not one of the supported syntaxes, please contact us.

Adding TemaSearch to your Site

The code below shows an example search form

<FORM method=GET action="/search">
<INPUT type="text" name="q" size="30" value="">
<INPUT type="submit" name="submit" VALUE="Søk">
</FORM>

To activate TemaSearch, you add a small number of hidden input fields (or, parameters, for short) to the form. Each parameter has a name and a value, and it written as a new INPUT element. For example, the code below adds a parameter "pq" with the value "boolean".

<INPUT type="hidden" name="pq" value="boolean">

You will need to add at least these parameters shown in the table below. (You can click on the parameter name to jump to the section providing details about that parameter.)
puser your TemaSearch username.
pu the search engine URL - this is usually the same as the original action attribute, fully qualified if necessary.
pq the name of the field that contains the search expression. TemaSearch modifies the field to include new search words.
px lists all the fields that were added for TemaSearch so that they are not forwarded to the target search engine.
pqsntx describes how queries are written for your search engine.
If we assume that the example form is hosted at http://www.myhost.com/search, that our username is "jsmith", and that the "standard" query syntax is used, then we will add these parameters:
puser jsmith
pu http://www.myhost.com/search
px px pu puser pq pqsntx
pqsntx standard
pq q

Writing these parameters as hidden input elements, the example form looks like this (new or changed parts shown in bold.)

<FORM method=GET action="search">
<INPUT type="hidden" name="puser" value="jsmith">
<INPUT type="hidden" name="pu" value="http://www.myhost.com/search">
<INPUT type="hidden" name="px" value="pu px pq pqsntx puser">
<INPUT type="hidden" name="pq" value="q">
<INPUT type="hidden" name="pqsntx" value="standard">
<INPUT type="text" name="q" size="30" value="">
<INPUT type="submit" name="submit" VALUE="Søk">
</FORM>

Note:

Now that the parameters have been added, the final step is to change the action attribute to the TemaSearch URL you were given with your account. For example, if the TemaSearch URL you are given is http://www.tema-search.no/form then the form tag is changed from

<FORM method=GET action="search">

to

<FORM method=GET action="http://www.tema-search.no/form">

And that's it! Once you have saved the page to your site, TemaSearch will be enabled on the search field. If you have other search fields on your site, you can add temasearch to each of them by repeating these steps for the other fields. (If these fields use a different target URL, they must be registered with your account. See the section on security for details.

Advanced Options

TemaSearch offers a number of configuration options that you can use to control exactly what types of alternative words are generated
povs includes/excludes bokmål/nynorsk translations
psynn includes/excludes near synonyms
pifl includes/excludes equivalent inflections
penab turns on or off TemaSearch.
By default, all these choices are active so the modified query will include translations, synonyms and inflections where appropriate. A number of other parameters are avaialble for more precisely configuring TemaSearch, see Parameter Summary.

Not all accounts include access to advanced options. Please check your account details to ensure you have access.

Administrator Configuration

A site administrator can control the behavoiur of TemaSearch by providing fixed values for TemaSearch parameters. Each parameter is configured by adding a hidden field to the form, in the same way the standard parameters were previously added.

For example, to only include translations, (i.e. exclude synonyms and inflections), psynn and pifl parameters are added to the form:

<INPUT type="hidden" name="psynn" value="0">
<INPUT type="hidden" name="pinf" value="0">

(Although not shown here, you need to add "psynn pinf" to the px parameter, to be sure all parameters are recognised by TemaSearch.)
The form does not need to include the povs to enable translations, as they are enabled by default.
When you now use your search page, you'll see that TemaSearch adds translations, but not synonyms or alternative inflections.

User Configuration

It is possible to give website users control over the types of results produced, so that they can adjust TemaSearch to their own needs. This is achieved by using FORM input elements, such as checkboxes, list boxes etc. to provide parameter values.

For example, to allow the user to enable or disable use of translations, you add

<INPUT type="checkbox" name="povs" value="1" CHECKED>

(Not forgetting of course to update the px parameter.) When displayed in a browser, the form includes a checkbox that controls the use of translations in TemaSearch. The CHECKED attribute means that it is checked by default and that translations are active unless the user specifically turns them off by clearing the checkbox. The other on/off-style parameters, such as psynn, pifl and penab can also be controlled in this manner.

Parameter Summary

This section provides a reference for all form parameters used by TemaSearch.

Mandatory parameters are parameters that do not have a defualt value. Such parameters must be defined, or the request will fail with an error message indicating the missing parameter.

'paltlic' parameter

Summary Alters options to comply with licenced features.
Details If an option is selected for a feature that is not licenced, normally an error is produced. (The result of the error depends upon the pnoerr parameter, but at the very least, the no alternatives will be added to the query. Setting this parameter allows the search to continue using those features that are licenced, and ignoring the request for unlicenced features.

In a production environment, you would typically set this value to true so that attempted use of unliceced features does not stop temasearch from being used for those features that are licenced. In development, setting this value to false can help uncover that some features are not working because they are unlicenced.

Values
false Don't adjust features to comply with the licence. If an unlicenced feature is used, the query is not rewritten at all and an error is returned.
true Adjust features to comply with the licence. If an unlicenced feature is requested, the request is ignored and the search continues using those features that are licenced.
Default true

'pbase' parameter

Summary Adds the baseform of an inflected form.
Details When the query includes an inflected word, the non-inflected for of that word is added to the query.
If steming is active, this parameter has no effect.
Values
1 Baseforms of inflected words are added.
0 Baseforms of inflected words are not added.
Default 0

'pcharset' parameter

Summary Indicates the character set expected by the search engine.
Details This is usually the same as the character set defined by the META tag. For example

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

This defines the character set to be "utf-8", so the pcharset should be set to "UTF-8".
Values
ISO-8859-1 Selects ISO-8859-1 as the character set.
UTF-8 Selects UTF-8 as the character set.
Default ISO-8859-1

'penab' parameter

Summary Enabled or disables TemaSearch.
Values
0 TemaSearch is disabled and no changes are made to the search query.
1 TemaSearch is enabled and alternative words are added to the query.
Default 1

'pifl' parameter

Summary Includes alternative inflections.
Values
0 alternative inflections are not included
1 alternative inflections are included
Default 1

'plangin' parameter

Summary Provides a hint as to the language the user is using to enter the search query.
Details

When the search query could be both bokmål and nynorsk, the language TemaSearch assumes it to be is given by this parameter. Note that if the language of the input query is not ambiguous, then this parameter has no effect.

This parameter is useful if the user base is strongly biased to one langage or the other.
Values
no_NO_B When ambiguous, the search text is treated as bokmål.
no_NO When ambiguous, the search text is treated as nynorsk.
(Empty string) The search text is treated as both bokmål and nynorsk.
Default (Empty string)

'plangout' parameter

Summary Selects the language for new words added to the query.
Details This parameter is usually set if the documents being searched are all written in the same language. Setting this parameter to that language ensures that the rewritten query only contains words in that language. If the documents being searched comprise a mix of languages, this parameter should not be set, so all relevant languages are included.
While similar, this is not the same as disabling translation. For example, assume the output language is set to bokmål. A nynorsk user will still require translation (to bokmål) while a bokmål user will not. So, translation is used if the text the user types needs translating to the output language.
Values
no_NO_B Only bokmål words are added to the query. If they were translated, any nynorsk words in the original query are removed.
no_NO Only nynorsk words are added to the query. If they were translated, bokmål words are removed.
(Empty string) Both bokmål and nynorsk are added to the query and no words are removed.
Default (Empty string)

'plemifl' parameter

Summary Adds all inflections of a non-inflected word.
Details If the query includes a non-inflected word, then inflections of that word are added to the query.
Inflected forms are output even if stemming is active (pstem=1), though you typically will not enable this parameter if stemming is active.
Values
1 Baseform inflections are enabled.
0 Baseform inflections are disabled.
Default 0

'pmax1' parameter

Summary Limits the total number of words that are included to the query.
Details This is useful for keeping the overall number of words to a reasonable level, especially when the search engine accepts upto a specific maximum number of words. Often, if the number of words goes over this limit, the additional words are ignored, which may cause important words in the query to be dropped. When TemaSearch can provide more words than there is room for, it discards the least relevant words to ensure that all the original words and the 'best' alternatives are included in the query, without going over the total-word limit.
Values
-1 The number of words allowed in the query is not limtied.
N Limits the number of words to
Default -1

'pmax2' parameter

Summary Limits the number of words that TemaSearch will add to each word in the query.
Details If many options are selected, such as both translations and synonyms, some words may have a large number of possible alternatives. In some cases, a large number of alternatives can make the search less precise. Setting this value to a lower number can help maintain a high search precision. When there are more alternatives for a given word than are allowed by this parameter, the least relevant words are discarded until the number of alternative words is within this limit.
Values
-1 The number of alternatives added to each word the query is not limited.
N Limits the number of alternatives for each word to
Default 4

'pnoerr' parameter

Summary Controls error reporting.
Details This parameter is intended as a development aid. When setting up TemaSearch or making changes to your TemaSearch configuration, it is a good idea to enable error reporting by either removing this parameter or setting it to "0". If there are problems with any of the parameters, these problems are shown instead of the search results so that you can fix the problem. Eventually, when you have tested that TemaSearch functions as you want, you can disable error reporting. This will ensure that your users are not disrupted should errors occur when using TemaSearch.
Values
0 errors are reported
1 errors are not reported
Default 0

'povs' parameter

Summary Includes translations (nynorsk or bokmål).
Values
0 translations are not included
1 translations are included
Default 1

'pq' parameter

Summary The name of the parameter in the original form that holds the search query.
Details This usually corresponds to an input field of type "text".

'pqsntx' parameter

Summary Indicates how the search engine expects the query to be written.
Details

This is used by TemaSearch to ensure new words can be added in a way that is compatible with the search engine. Choose a syntax that most closely matches the features of your search engine. The most important features are how the search engine expects mandatory and alternative words to be written. If a syntax includes features that are not available with your search engine, that is not a problem. The features indicate that TemaSearch would understand the feature should a user use it with a query. TemaSearch only changes the query using the boolean OR operator, in the form is written for your search engine.

All syntaxes allow the use of the word prefixes + and - to specifically include or exclude words. Temasearch does not provide alternatives for these words.

Values
standard Adjacent words are assumed to be mandatory (boolean AND). Optional words are separated with 'OR'.
standard-lower Adjacent words are assumed to be mandatory (boolean AND). Optional words separated with 'or'.
standard-lower+paren Adjacent words are assumed to be mandatory (boolean AND). Optional words separated with 'or'. Alternatives added by temasearch are included in parentheses, which can help group the terms correctly in complex queries.
standard-or Words are assumed to be alternatives (boolean OR) unless otherwise indicated with a boolean operator.
bool-paren-and Words not enclosed in parentheses are assumed to be mandatory (boolean AND). Word enclosed in parenteses are alternatives (boolean OR.)
bool-plus-comma '+' for boolean AND, ',' for boolean OR. Words are separated by whitespace are taken as phrases.
norsk-bool-paren-and All words are assumed to be mandatory (boolean AND). Words enclosed in parenteses are alternatives (boolean OR.) The aliases OG, ELLER, IKKE are available.

'pstem' parameter

Summary Controls stemming.
Details An inflected word (such as a plural noun, or past-tense verb) produces alternative words that are similarly inflected. Some search engines automatically search for inflections of a word, and so there is little need to include alternative words as inflections, as the non-inflected form is sufficient. Use of stemming with search engines that do not search for inflections, that is, engines that perform an exact match, can help reduce the number of search terms for a more focused result.
Values
0 No stemming is applied. Alternative words added to the query are inflected according to the input word.
1 The input words are stemmed. Alternative words added to the query are not inflected, even when the input words are inflected.
Default 0

'psyng' parameter

Summary Includes general synonyms. General synonyms are words that are close but not always identical in meaning for all senses of the original word. This is typically used to expand the search into related areas.
Values
0 general synonyms are not included
1 general synonyms are included
Default 0

'psynn' parameter

Summary Includes near synonyms. Near synonyms are words that are identical, or almost identical in meaning to the original word. These mostly include spelling variations for a given word.
Values
0 near synonyms are not included
1 near synonyms are included
Default 1

'pu' parameter

Summary The fully qualified URL of the search results page.
Details This is usually the same as the action attribute, though this parameter must be fully-qualified, so protocol and server name will need to be included in the parameter if not present in the action.

'puser' parameter

Summary The name of the TemaSearch account to use to gain access to TemaSearch services.

'px' parameter

Summary Lists all TemaSearch parameters added to the form.
Details

The FORM element contains parameters for your search engine and parameters for TemaSearch. In orderthat TemaSearch can locate information in need, all TemaSearch parameter must be listed in the px parameter, separated by a space.

For example, if the parameters "pq" and "puser" were added, then the "px" parameter should be defined as "px pq puser" (note that the list includes the px parameter itself.) If any of the temasearch parameters have to be renamed to avoid clashes with existing parameters, the renamed parameter should appear in the list. (See resolving parameter conflicts for details.

Resolving Parameter Name Conflicts

The TemaSearch parameter names have been chosen to be fairly unusual so not to clash with existing parameter names in your form. But if a TemaSearch parameter does have the same name as a parameter already defined by your form, you can rename the TemaSearch parameter to avoid having two parameters with the same name. Renaming is done by adding an underscore '_' at the front of the name.

For example, if the parameter "puser" was already used by your form, then the TemaSearch parameter should be renamed to "_puser".

The px parameter lists all the TemaSearch parameters added to the form, and includes each parameter using the name as it appears the form. For example, after renaming puser, the px parameter is defined like this

<INPUT type="hidden" name="_puser" value="jsmith">
<INPUT type="hidden" name="px" value="pu px pq pqsntx _puser">

Frequently Asked Questions

General

Q.

I do not have access/do not want to change the current form on our webserver, but I would still like to try out TemaSearch. Is this possible?

A.

Yes! You can still evaluate TemaSearch even if you do not have permission or do not wish to change the search page on your current site.

  1. Create a local copy of the web page containing the search form, by saving the page in your browser to your local drive (Usually done using the menu item "File | Save As...")
  2. Make the changes to the page on your local drive, as described in this document. Save the changes to disk.
  3. Open the saved page in your browser. When you type in a search query, you will see the search results from your main web site.
Note that because the page is saved to a new location, relative links and resources will not display, although this should not affect your ability to test TemaSearch.

 
Q.

Will using TeamSearch affect the performance of my site?

A.

Our experience shows that adding TemaSearch to a site does introduce a delay of around 100 milliseconds, or 1/10 of a second. In practice, when added to all the other delays present in the web, this delay is virtually unnoticible.

Details

After integrating TemaSearch, the additional time required to produce the search results can be attributed to
  1. Network delay contacting the TemaSearch server
  2. TemaSearch processing (finding alternative words)
  3. Additional time required by your search engine to process the modified query.

To ensure network delay is minimal, we aim to provide you access to a TemaSearch server that is geographically close to your own server, resulting in faster access time. Typical network delay is around 20-50 milliseconds.

The time required by TemaSearch to process a request and rewrite a query is in the order of 1-10ms.

The additional words added to the search query may increase the time required by your search engine. The exact increase will depend on the efficiency of the search engine you use. Most "world-class" search engines use highly efficient implementations resulting in no noticable increase in processing time (typically a few tens of milliseconds.)

Adding all of these delays together gives a total delay of around 100 ms.

 
Q.

For words that are inflected, TemaSearch adds alternatives that are also inflected. How are these corresponding inflected words produced?

A.

TemaSearch incorporates full morphological analysis and synthesis for both Bokmål and Nynorsk. That is, inflection details are maintained for every single word in the system. Inflections are produced based on the grammatical category describing the inflection, and ensures the correct corresponding inflection is produced, even when the original word and alternative words are from different languages, or when changes are required, such as doubling a final consontant or omlyd.

This is in contrast to simpler systems that copy the inflection ending from the original word to the alternative word, or use generalized rules to best-guess the correct inflection. Both these systems typically produce more incorrect inflections than the approach used by TemaSearch.

Security

Q.

The TemaSearch parameters include my account name (username) which is visible to anyone who looks as the source of the HTML for the search form. What is there to stop someone else using temasearch on my account?

A.

Each use of TemaSearch involves directing the browser to a search results page to display the actual search results. TemaSearch will only direct to pages registered with your account. This prevents your account being used to show results from soneone else's website. Additionally, the page the user visited to launch temasearch is also checked against pages registered in the account. If the pages to not match with any registered pages on your account, TemaSearch will not be activated, and no usage is recorded against your account.

 
Q.

Despite these security measures, what should I do if I think my account is being used without permission?

A.

Contact us as soon as possible and we will investigate the issue immediately. All account accesses are logged, along with the users IP, which can help track down unauthorized usage.

Troubleshooting

Q.

Why do I get the error message "parameter 'XX' not defined", even though I have included it in the form?

A.

This happens when the parameter is not listed in the px parameter list. Make sure the px parameter is defined and that it includes the missing parameter in the list of TemaSearch parameters.

 
Q.

Why do I get strange characters appearing in the converted query?

A.

This is usually due to a difference in the characer encodings used by the page on your site and the search engine. Check that the pcharset parameter is set to the encoding expected by your search engine.

 
Q.

I am trying to use advanced options, but it is not working, I'm still getting options enabled that I want to disable. What's going on?

A.

You know the advanced options are not working because you still get all the various types of alternative words. You can check

  1. That your account has access to the advanced options.
  2. That the advanced option parameters have been included in the px parameter.

 
Q.

I've made all the necessary changes, but TemaSearch still isn't working.

A.

If you are using the pnoerr parameter, set this to 0 to ensure all problems are displayed rather than being silently hopped over. When you enter a new search you should see an error message indicating the likely cause of the problem.