Getting full sentences from index of word

Go To


I am trying to find a more elegant way than the below code to get a list of sentences based on the index of one of the containing words. So for example if I give it a list of words, such as user names, it finds the index of all of those words (this is done already and is the GetWordsMatches method) and then, using the index of that word, I want to grab the whole sentence.

I have two problems, one, I can't figure out how to look before the word to the previous period, just the end one, two, I cant figure out how to stop it from crashing if the last word match does not have a period before the end of the file.

public static List<string> GetSentencesFromWords(List<string> Words, string FileContents)
        List<string> returnList = new List<string>();
        MatchCollection mColl = GetWordsMatches(Words,FileContents);
        foreach (Match ma in mColl)
            int tmpInd = ma.Index;
            int endInd = FileContents.IndexOf(".", tmpInd);
            string tmp = FileContents.Substring(tmpInd,endInd);
        return returnList;

Is there a more elegant way to do this?

2012-04-04 00:48
by SpectralEdge
What does GetWordsMatches do exactly - rikitikitik 2012-04-04 00:57
It gives a MatchCollection of matches where the words from the list are located - SpectralEdge 2012-04-04 02:19


How about a LINQ powered solution:

    public static List<string> GetSentencesFromWords(List<string> words, string fileContents)
        return fileContents.Split('.')
            .Where(s => words.Any(w => s.IndexOf(w) != -1))
            .Select(s => s.TrimStart(' ') + ".")
2012-04-04 02:44
by 2Toad
That, is beautiful. Thank You it works perfectly - SpectralEdge 2012-04-04 12:06


Just fast...

  • you can use LastIndexOf(str, index) to search from some position backwards,

  • for 'end condition' you should I guess just add one if on the '.' search (if it reaches the end it'd return '-1'),

...anyway, it might be better to split the file contents (with . as a delimiter), that way you won't have the problem w/ the last one as it'd pick up the last line. Then search for words (within each line, IndexOf with current index). Or I'd probably use enumerator (w/ yield return) extension method to do all that in parallel - and return IEnumerable so that you could be more 'functional', add other things to the query.

hope this helps

2012-04-04 00:57
by NSGaga
I don't have much control over the files I am receiving. But, I guess I could just slap a period on the end of the string if it is lacking one. I will see if LastIndexOf will work - SpectralEdge 2012-04-04 02:21
you endInd just gets -1, handle that, and give it Substring w/o the length (or use filecontent.Length - tmpind). Also a bug, Substring takes 'length' not an index (2nd parameter). And you could add a ., just TrimEnd first for blanks, lines (and dots) - but fixing it properly is kind of easier:) - NSGaga 2012-04-04 11:38