Tuesday, September 29, 2009

Full text search for Entity Framework

Recently I played a bit with the "new" Entity Framework (basically the Object Relational mapper of Microsoft). It seems to be made quite good and it makes it very easy to work with objects in your application without having to map them back and forth from and to SQL. Also the possibility to launch LINQ queries over the data is very interesting because this avoids many errors and improves usability.
But I missed something: a fulltext search.
SQL Server has a very good fulltext search integrated and it would be nice to be able to perform queries through LINQ with it; but since Microsoft wanted to support also other database management systems they had to use only standard SQL for their mappings.

Faced with this problem I resolved it by writing my own pseudo fulltext search by using the possibility of creating dynamic LINQ queries.
First I will show You the code and then I explain a bit what I have done.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;
using System.Text;
using System.Reflection;
using System.Data.Objects;


public static class ObjectContextExtensions
{
/*
/// <summary>
/// Searches in all string properties for the specifed search key.
/// It is also able to search for several words. If the searchKey is for example 'John Travolta' then
/// all records which contain either 'John' or 'Travolta' in some string property
/// are returned.
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="query"></param>
/// <param name="searchKey"></param>
/// <returns></returns>*/
public static IQueryable<T> FullTextSearch<T>(this IQueryable<T> queryable, string searchKey)
{
return FullTextSearch<T>(queryable, searchKey, false);
}

/*
/// <summary>
/// Searches in all string properties for the specifed search key.
/// It is also able to search for several words. If the searchKey is for example 'John Travolta' then
/// with exactMatch set to false all records which contain either 'John' or 'Travolta' in some string property
/// are returned.
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="query"></param>
/// <param name="searchKey"></param>
/// <param name="exactMatch">Specifies if only the whole word or every single word should be searched.</param>
/// <returns></returns>*/
public static IQueryable<T> FullTextSearch<T>(this IQueryable<T> queryable, string searchKey, bool exactMatch)
{
ParameterExpression parameter = Expression.Parameter(typeof(T), "c");

MethodInfo containsMethod = typeof(string).GetMethod("Contains", new Type[] { typeof(string) });
MethodInfo toStringMethod = typeof(object).GetMethod("ToString", new Type[] { });

var publicProperties = typeof(T).GetProperties(BindingFlags.Public | BindingFlags.Instance | BindingFlags.DeclaredOnly).Where(p => p.PropertyType == typeof(string));
Expression orExpressions = null;

string[] searchKeyParts;
if (exactMatch)
{
searchKeyParts = new[] { searchKey };
}
else
{
searchKeyParts = searchKey.Split(' ');
}

foreach (var property in publicProperties)
{
Expression nameProperty = Expression.Property(parameter, property);
foreach (var searchKeyPart in searchKeyParts)
{
Expression searchKeyExpression = Expression.Constant(searchKeyPart);
Expression callContainsMethod = Expression.Call(nameProperty, containsMethod, searchKeyExpression);
if (orExpressions == null)
{
orExpressions = callContainsMethod;
}
else
{
orExpressions = Expression.Or(orExpressions, callContainsMethod);
}
}
}

MethodCallExpression whereCallExpression = Expression.Call(
typeof(Queryable),
"Where",
new Type[] { queryable.ElementType },
queryable.Expression,
Expression.Lambda<Func<T, bool>>(orExpressions, new ParameterExpression[] { parameter }));

return queryable.Provider.CreateQuery<T>(whereCallExpression);
}
}


I created this method as an extension method for an IQueryable<T>.
This means if you have your ObjectContext (from the Entity Framework) then when accessing all objects of some type e.g. "context.Customers" then this returns an IQueryable<Customers>
This means that You can call the above method like:

context.Customers.FullTextSearch("serachkey");

or

context.Customers.FullTextSearch("searchkey", true);

which searches only for whole words also if they contain spaces. This gives You then again an IQueryable<Customers> which contains only the filtered objects.

Ok. Now something about the code.
This method "simply" creates a lambda expression which gets all the string properties of the type T, calls the "Contains" method with the search key as parameter on them and connects them with an OR. This means that it would be similar to do something like:

context.Customers.Where(c => c.FirstName.Contains("searchkey") || c.LastName.Contains("searchkey") || c.Street.Contains("searchkey") || ...);

But this can become a very long, nasty and error prone task if You have for example 30 string columns. Exactly because of this I wrote this code :-)

Hope this helps You...

15 comments:

Peter Gfader said...

Nice work!
But what about the performance of doing this kind of full text search.

I used "full text search" feature of SQL Server in LINQ using this blog post
http://sqlblogcasts.com/blogs/simons/archive/2008/12/18/LINQ-to-SQL---Enabling-Fulltext-searching.aspx
Fast and easy

Anyway, how did you find it to work with Expressions?

Unknown said...

Hello Peter,
thanks :-)

I did not check how the performance of this approach is. I tested it only on some data and it was fast enough for my needs. I would have to test it on a very large amount of data; which I will do as soon as I have some time...

Well, I think that the fulltext search which is provided by SQL Server should be much faster und offers also more features such as finding similar words.
The only problem with that is that it is not supported by the ADO.Net Entity Framework (LINQ to Objects) because not all DBMS systems offer a fulltext search feature and to be compatible also with them the EF uses only standard SQL.

Yes, LINQ to SQL may offer it since it works "only" with SQL Server and can therefore get all the power and features out of it :-)

Initially it was a little bit challenging to understand how Expressions and Expressiontrees work. The problem was to understand how the different types of expressions work and how the they are combined and evaluated; because this is not always easy and obvious to understand.
For example Method-Variables which are passed to an Expression Tree are not there as values but they are a field of the class in which the method is and if you want the value then you have to get the value of that field.
I found it also easier to evaluate an existing expression tree (see my last post about ORMapper) than creating my own Expression tree.

Anonymous said...

Just a warning that, as Peter notes, this is not full text search and can have a severe impact on the performance of your app.
For production, I'd suggest making use of a stored procedure to get access to SQL's native full text search...

Unknown said...

Hi Anonymous,
thanks for the warning; but I know that this is not a real full text search and as described above I know also that the performance is not the best one. But this is, as far as I know, the only solution for a, let's call it extended search, for the entity framework. If You have a better solution to perform this task, please share Your knowledge with us.

What exactly do You mean with "SQL's native fulltext search"?

vslee said...

how about searching through integer fields (or other non-string fields)? When I try doing .ToString().Contains() it throws an error. Any ideas?

Unknown said...

Hi vslee,
well the big problem is, that (as far as I know) SQL does not support to search for parts of integers (contains) and therefore LINQ also doesn't support it. My code creates "only" a LINQ statement which is then translated by the .NET Framework to SQL and executed. So you have no influence on how it is translated or which statements ist supports. However you could have a look at http://manfred-ramoser.blogspot.com/2009/08/linq-expression-trees-and-ormapper.html. With that code you can "easily" influence how linq statements are converted to SQL. Maybe you could use something like:
SELECT * FROM yourTable
WHERE CAST(intField AS NVARCHAR) LIKE '%6%'

Unknown said...

Manfred,

I think, Anonymous on December 16, 2009 meant FREETEXT(col_name, "value")

Correct me if I am wrong, but I don't think you are leveraging full-text search capabilities of SQL Server 2008. When working with EF1 we resorted to regular ADO.NET SqlDataReader in order to invoke full-text search queries; I was hoping that we can use EF4 facilities to "pass-through" sql statement to the database...

Md. Rezoanul Alam said...

Went like a charm..Exactly what I was looking for..Kewl Work...

Shabir said...

Great Job....

Robert Ginsburg said...

I liked and used your search extension several times. I ended up also needing access to FreeText and Contains methods in EF and ended up building this http://www.balsamicsolutions.com/Blog/Post/2/Full-text-search-in-Microsoft%27s-Entity-Framework

Anonymous said...

Works great with one word entered in search box but I can't seem to get it to search properly with more than one word. The "exactmatch" bool is incorrectly returning true. Any ideas how to fix this?

Anonymous said...

how do you get this to do "and" instead of or???

Mario Binder said...

@Anonymous
If you'll combine all properties with an AND clause, you can use
Expression.And instead Expression.OR.

But when you combine your searchstrings with AND you should compare the currently searchkey with the first entry of searchkeys. The first will be combined with OR, and the next strings with AND. like this:

orExpressions = searchKeyPart == searchKeyParts[0] ? Expression.Or(orExpressions, callContainsMethod) : Expression.And(orExpressions, callContainsMethod);

I hope I was of help to you


Nikolay Angelov said...

Greate article man. It is really helpful. This is what I was looking for.

Samuel said...

Thank you very much. Exactly what I was looking for! For small data sets, this solution is great. Btw. DevExpress does exactly the same with their ORM (XPO) within their RAD Framework (XAF).