Using SharePoint search in an ASP.NET application with noise filtering

In this post, I'll detail the steps & code used to get this working. I used this method to add search capabilities to a site implemented with MCMS 2002.

By Last Updated: July 15, 2004 7 minutes read

Most content site owners want to be able to search the content in their sites. If you have access to MicrosoftSharePoint Portal Server 2003 (SPS 2003), you can use the advanced searching features to create a index of your site. The index is exposed via the SharePoint Query Service web service by submitting a specially crafted XML query to the service (this part, the syntax of the query, gave me the most trouble).

In this article, I’ll detail the steps and include some of the code used to get this working. I used this method to add search capabilities to a site implemented with Microsoft Content Server 2002. There are quite a few steps involved in implementing this process:

  1. Create a content index in SharePoint
  2. Create a content source in SharePoint
  3. Build a search submission page
  4. Implement pre-search logic (such as removing noise words from keyword searches)
  5. Compose logic to craft the search query
  6. Submit the query
  7. Render the dataset returned to present the results

Create a content index in SharePoint

Go into the SharePoint Portal’s Site Settings and click Configure Search and Indexing. Now click the Add Content Index. The name you give the index is what you’ll use in the query.

Create a content source in SharePoint

Now that an index is created, you’ll need a content source to store the index of the website. Go back to the Configure Search and Indexing page and click Add content source. Select the name of the content index you created above, specify it’s a website, and hit next. Now, enter the URL of the site, a short description, the crawl configuration you desire, and select the source group you specified when you created the content index.

Now that’s finished, execute a full update of the website so SharePoint will start crawling and have it’s index populated by the time we’re ready to fire the search off.

Build a search submission page

Nothing fancy here, just a ASP.NET page with an input box for keywords and a submit button. I added some advanced searches that I use to further filter my search.

For example, in a MCMS 2002 site, you can have your url based off the site channel hierarchy. I used that information to filter products, services, news releases, or hit results from specific divisions. That’s something you need to determine based on your needs.

Add a web reference to your ASP.NET project pointing to the SharePoint Query Service. It can be found in the /_vti_bin/search.asmx off the root of the portal’s website.

Implement Pre-Search Logic

Wire up the click event to fire off some logic to trigger the search. Before crafting the XML query, you might want (I did) to clean out any noise files from the keywords submitted. This is actually a lot easier than you think. I took the english noise file from SharePoint (C:\Program Files\SharePoint Portal Server\DATA\Config\noiseeng.txt), modified it so every single word or letter (whatever made up a word) would be listed on a separate line. Then, I loaded the list of noise words into a ADO.NET datatable (one column… one record per word) and added it to the ASP.NET cache with a dependency on the noise file.

private void InitNoiseTable() {
  // if the noise datatable is in cache, load it
  if (HttpContext.Current.Cache[c_noiseCacheKey] == null) {
    // create noise table
    this.m_noiseWordTable = this.LoadNoiseFile();
    // add to cache
    HttpContext.Current.Cache.Insert(this.c_noiseCacheKey,
                                     this.m_noiseWordTable,
                                     new CacheDependency(this.m_noiseFilePath)
                                    );
  } else
    this.m_noiseWordTable = (DataTable)HttpContext.Current.Cache[c_noiseCacheKey];
}

Now that I have a reference to the noise list, I need to get a list of all the keywords that were submitted. In my case, I only cared for alphanumeric characters, as well as the spaces the separated these words. After cleaning out the non alphanumeric characters, I split the resulting list into an array and checked each word to see if it was a noise word. If it is, I removed it from the array. At this point, I have an array of keywords without noise.

public string BuildMssqlftQuery(string searchParm,
                                string searchScope,
                                Components.AdvancedSearchOptions advancedOptions) {
  StringBuilder keywordList = new StringBuilder();

// clean out noise
Stack cleanKeywords = this.RemoveNoise(searchParm, ConfigurationSettings.AppSettings["SearchNoiseFile"].ToString()
);

bool firstPass = true;
foreach (object keyword in cleanKeywords) {
  // if this isn't the first time, add conjunction
  if (!firstPass)
    keywordList.Insert(0, " AND ");

  firstPass = false;

  // add keyword to search on
  keywordList.Insert(0, keyword.ToString());
}
// method continued below

Compose logic to craft the search query

At this point, we have a clean list of keywords so I’m ready to create the XML search query. This query is sent to the SharePoint Query Service as an XML request. First thing is to take all the keywords and string them together into a clean T-SQL WHERE clause so I joined them together with AND’s. Now it’s time to build the XML query string… I used a string builder. The first code block shows the below contains the framework for the query. The second code block is where I built the query:

<!-- CONSTANTS -->
// XML construct template to send to SPSQueryService web service
private const string c_xmlQueryConstruct = @"<?xml version=\"1.0\" encoding=\"utf-8\" ?>"
              + "<QueryPacket xmlns=\"urn:Microsoft.Search.Query\" Revision=\"1000\">"
              + "<Query domain=\"QDomain\">"
              + "<SupportedFormats>"
              + "<Format>urn:Microsoft.Search.Response.Document.Document"
              + "</Format></SupportedFormats>"
              + "<Context><Range></Range></Context>");

// XML construct template that will contain the MSSQLFT query
const string c_xmlContextConstruct = @("<Context>"
              + "<QueryText language=\"en-US\" type=\"MSSQLFT\">"
              + "<![CDATA[" + this.BuildMSsqlftQuery(keywords, searchScope)
              + "]]></QueryText></Context>");

// XML construct holding the number of results to return
msQuery.Append("<Range><StartAt>1</StartAt><Count>20</Count>"
               + "</Range></Query></QueryPacket>");

// max results returned by the query
private const int c_maxResults = 50;

<!-- BUILD QUERY -->
// SELECT
mssqlftQuery.Append("SELECT ");
mssqlftQuery.Append("\"DAV:href\",");
mssqlftQuery.Append("\"urn:schemas.microsoft.com:fulltextqueryinfo:rank\"");

// FROM
mssqlftQuery.Append("FROM "+searchScope +"..SCOPE()");

// WHERE
mssqlftQuery.Append("WHERE CONTAINS(*, '"+keywordList +"')");

// ORDER BY
mssqlftQuery.Append("ORDER BY \"urn:schemas.microsoft.com:fulltextqueryinfo:rank\" DESC");

// build & add range of responses
string x;
x = c_xmlRangeConstruct;
x = x.Replace("<count></count>", c_maxResults.ToString());
wsXmlQuery = wsXmlQuery.Replace("<range></range>", x.ToString());

// build & add query
x = c_xmlContextConstruct;
x = x.Replace("^QUERY^", mssqlftQuery.ToString());
wsXmlQuery = wsXmlQuery.Replace("<context></context>", x.ToString());

return wsXmlQuery.ToString();

You’ll notice in the FROM part of the query I concatenated the scope. This is where you need to put your content index name. So if your index name was Marketing_Internet_Site, your FROM clause should be: FROM Marketing_Internet_Site..SCOPE()

Here’s what the resulting XML should look like:

<?xml version="1.0" encoding="utf-8" ?>
<querypacket xmlns:"urn:Microsoft.Search.Query" Revision="1000">
  <query domain="QDomain">
    <supportedformats>
      <format>urn:Microsoft.Search.Response.Document.Document</format>
    </supportedformats>
    <context>
      <querytext language="en-US" type="MSSQLFT">
        <![CDATA[
        SELECT
          "DAV:href",
          "urn:schemas.microsoft.com:fulltextqueryinfo:rank"
        FROM FIS_Site..SCOPE()
        WHERE CONTAINS(*, '~~KEYWORDS_GO_HERE~~')
        ORDER BY"urn:schemas.microsoft.com:fulltextqueryinfo:rank"  DESC
        ]]>
      </querytext>
    </context>
    <range>
      <startat>1</startat>
      <count>50</count>
    </range>
  </query>
</querypacket>

You may notice that my SELECT is also pretty slim. This is in part because all I need is the path to the page in MCMS 2002 so I didn’t want to grab more than necessary. You can see all the properties available to select from by going to “Manage Properties From Crawled Documents” in the SharePoint Portal Site Settings.

Submit the query

Now that the query is built, we just need to submit it (yes, the project switches to VB.NET… all the search logic is in the a business component which I wrote in C#):

<!-- SEARCH CODE -->
Dim searchWS As SPSSearchWS.QueryService = New SPSSearchWS.QueryService
'craft the XML MSSQLFT search query
Dim scope As String = "Marketing_Internet_Site";
Dim query As String = search.BuildMssqlftQuery(Me.SearchQueryTextBox.Text, scope, advOptions)
Dim ds As DataSet = searchWS.QueryEx(query)

You’ll notice on line 5 I pass 3 parameters. The first are all the keywords, the second contains the search scope, and the final is my object containing all the advanced search options they specified. You see how easy it is to submit the query in line 6.

Render the dataset returned to present the results

I’ll let you figure this out… it’s just a dataset after all.

If you need help creating the content index and/or content source within SharePoint Portal Server, read this article. He explains how he implemented searching a content source from a ASP.NET site via the web service, but he goes into more detail with screenshots and such regarding the creation of the index & source.

Join the conversation & leave a comment